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Polypeptide nucleic sequences xported from 

mycobacteria, v ctors comprising same and uses for 

diagnosing and preventing tuberculosis 



[001] The subject of the invention is novel recombinant screening, 
cloning and/or expression vectors which replicate in mycobacteria. Its subject is 
also a set of sequences encoding exported polypeptides which are detected by 
fusions with alkaline phosphatase and whose expression is regulated (induced or 
repressed) or constitutive during the ingestion of mycobacteria by macrophages. 
The invention also relates to a polypeptide, called DP428, of about 12 kD which 
corresponds to an exported protein found in mycobacteria belonging to the 
Mycobacterium tuberculosis complex. The invention also relates to a 
polynucleotide comprising a sequence encoding this polypeptide. It also relates 
to the use of the polypeptide or of fragments thereof and of the polynucleotides 
encoding the latter (or alternatively the polynucleotides complementary to the 
latter) for the production of means for detecting in vitro or in vivo the presence of 
a mycobacterium belonging to the Mycobacterium tuberculosis complex in a 
biological sample or for the detection of reactions of the host infected with these 
bacterial species. The invention finally relates to the use of the polypeptide or of 
fragments thereof as well as of the polynucleotides encoding the latter as means 
intended for the preparation of an immunogenic composition which is capable of 
inducing an immune response directed against the mycobacteria belonging to the 
Mycobacterium tuberculosis complex, or of a vaccine composition for the 
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prevention and/or treatment of infections caused by mycobacteria belonging to 

said complex, in particular tuberculosis. 

[002] The aim of the present invention is also to use these sequences 

(polypeptide and polynucleotide sequences) as target for the search for novel 

inhibitors of the growth and multiplication of mycobacteria and of their 

maintenance in the host, it being possible for these inhibitors to serve as 

antibiotics. 

[003] The genus Mycobacterium, which comprises at least 56 different 
species, includes major human pathogens such as M. leprae and M. 
tuberculosis, the agents responsible for leprosy and tuberculosis, which remain 
serious public health problems worldwide. 

[004] Tuberculosis continues to be a public health problem in the world. 
At present, this disease is the cause of 2 to 3 million deaths in the world and 
about 8 million new cases are observed each year (Bouvet, 1994). In developed 
countries, M. tuberculosis is the most common cause of mycobacteria infections. 
In France, about 10,000 new cases appear per year and. among the notifiable 
diseases, it is tuberculosis which comprises the highest number of cases. 
Vaccination with BCG (Bacille Calmette-Guerin), an avirulent strain which is 
derived from M. bovis and which is widely used as a vaccine against 
tuberculosis, is far from being effective in all populations. This efficacy varies 
from about 80% in western countries such as England, to 0% in India (results of 
the last vaccination trial in Chingleput., published in 1972 in Indian J. Med. Res.). 
Furthermore, the appearance of M. tuberculosis strains which are resistant to 
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antituberculars and the increased risk in immunosuppressed patients, patients 

suffering from AIDS, of developing tuberculosis, make the development of rapid, 

specific and reliable methods for the diagnosis of tuberculosis and the 

development of novel vaccines necessary. Fyor example, an epidemiological 

! 

study carried out in Florida, and of which the' results were published in 1993 in 
AIDS therapies, showed that 10% of the AIDS patients are affected by 
tuberculosis at the time of the AIDS diagnosis or 18 months before it. In these 
patients, tuberculosis appears in 60% of cases in a form which is disseminated 
and therefore nondetectable by conventional diagnostic criteria such as 
pulmonary radiography or the analysis of sputum. 

[005] Currently, a certainty on the diagnosis provided by the detection of 
bacilli which can be cultured in a sample obtained from a patient is obtained in 
only less than half of the tuberculosis cases, even in the case of pulmonary 
tuberculosis. The diagnosis of tuberculosis and of the other related mycobacteria 
is therefore difficult to carry out for various reasons: mycobacteria are often 
present in a small quantity, their generation time is very long (24 h for M. 
tuberculosis) and they are difficult to culture (Bates et al., 1986). 

[006] Other techniques can be used in clinical medicine to identify a 
mycobacterial infection: 

[007] a) The direct identification of microorganisms under a microscope; 
this technique is rapid, but does not allow the identification of the mycobacterial 
species observed and lacks sensitivity (Bates, 1979). 



3 



PATENT 
Custom r No. 22,852 
Attorney Docket No. 0371 5-0062-01 

[008] Cultures, when they are positive, have a specificity approaching 

100% and allow the identification of the mycobacterial species isolated; however, 

as specified above, the growth of mycobacteria in vitro is long (can only be 

carried out in 3 to 6 weeks of repeated cultures (Bates, 1979; Bates et aL, 1986)) 

and expensive. 

[009] b) Serological techniques are found to be useful under certain 
conditions, but their use is sometimes limited by their low sensitivity and/or 
specificity (Daniel et aL, 1987). 

[010] c) The presence of mycobacteria in a biological sample can also 
be detemriined by molecular hybridization with DNA or RNA using oligonucleotide 
probes which are specific for the sequences tested for (Kiehn et al., 1987; 
Roberts et al., 1987; Drake et al., 1987). Several studies have shown the 
advantage of this technique for the diagnosis of mycobacterial infections. The 
probes used consist of DNA, ribosomal RNA or DNA fragments from 
mycobacteria which are obtained from gene banks. The principle of these 
techniques is based on the polymorphism of the nucleotide sequences of the 
fragments used or on the polymorphism of the adjacent regions. In all cases, 
they require the use of cultures and are not directly applicable to biological 
samples. 

[01 1] The low quantity of mycobacteria present in a biological sample 
and consequently the low quantity of target DNA to be detected in this sample 
can require the use of a specific amplification in vitro of the target DNA before its 
detection with the aid of the nucleotide probe and using in vitro amplification 
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techniques such as PGR (polymerase chain reaction). The specific amplification 

of the DNA by the PGR technique can constitute the first stage of a method for 

detecting the presence of a mycobacterial DNA in a biological sample, the actual 

detection of the amplified DNA being carried out in a second stage with the aid of 

an oligonucleotide probe capable of specifically hybridizing with the amplified 

DNA. 

[012] A test for the detection of mycobacteria belonging to the 
Mycobacterium tuberculosis complex, by sandwich hybridization (test using a 
capture probe and a detection probe) was described by Ghevrier et al. in 1993. 
The Mycobacterium tuberculosis complex is a group of mycobacteria which 
comprises M. bovis-BCG, M. bovis, M. tuberculosis, M. africanum and M. microti 

[01 3] A method for the detection of low quantities of mycobacteria, 
belonging to the tuberculosis complex, by gene amplification and direct 
hybridization on biological samples has been developed. Said method uses the 
insertion sequence \S6110 (European Patent EP 0,490,951 B1). Thierry et al. 
described in 1990 a sequence which is specific to the Mycobacterium 
tuberculosis complex and which is called IS61 10. Some authors have proposed 
specifically amplifying the DNA obtained from Mycobacterium using nucleic 
primers in an amplification method, such as the polymerase chain reaction 
(PGR). Patel et al. described in 1990 the use of several nucleic primers chosen 
from a sequence known as a probe in the identification of M. tuberculosis. 
However, the length of the fragments obtained using these primers was different 
from the expected theoretical length and several fragments of variable size were 
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obtained. Furthermore, the authors observed the absence of hybridization of the 

amplified products with the plasmid which served to determine the primers. 

These results indicate that these primers might not be appropriate in the 

detection of the presence of M. tuberculosis in a biological sample and confinn 

the critical nature of the choice of the primers. The same year, J.L. Guesdon and 

D. Thierry described a method for the detection of M. tuberculosis, having a high 

sensitivity, by amplification of an M, tuberculosis DNA fragment located within the 

IS61 10 sequence (European Patent EP 461 ,045) with the aid of primers 

generating amplified DNA fragments of constant length, even when the choice of 

the primers led to the amplification of long fragments (of the order of 1000 to 

1500 bases) where the risk of interruption of the polymerization is high because 

of the effects of the secondary structure of the sequence. Other primers specific 

for the IS6110 sequence are described in European Patent No. EP-0,490,951. 

[014] The inventors have shown (unpublished results) that some clinical 

isolates of Mycobacterium tuberculosis lacked the insertion sequence IS61 10 

and could therefore not be detected with the aid of oligonucleotides specific for 

this sequence which could thus lead to false-negative diagnostic results. These 

results confirm a similar observation made by Yuen et al. in 1 993. The 

impossibility of detecting these pathogenic strains which are potentially present in 

a biological sample collected from a patient is thus likely to lead to diagnostic 

difficulties or even to diagnostic errors. The availability of several sequences 

specific for the tubercule bacillus, within which primers appropriate for 
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amplification will be chosen, is important. The DP428 sequence described here 

may be used. 

[015] M. bovis and M. tuberculosis, the causative agents of 
tuberculosis, are facultative intracellular bacteria. 

[016] These agents have developed mechanisms to ensure their 
survival and their replication inside macrophage, one of the cell types which is 
supposed to eradicate invasion by microorganisms. These agents are capable of 
modulating the normal development of their phagosome and of preventing them 
from becoming differentiated into an acidic compartment rich in hydrolase 
(Clemens, 1979; Clemens et al., 1996; Sturgill-Koszycki et al., 1994 and Xu et 
al., 1994). However, this modulation is only possible if the bacterium is alive 
inside the phagosome, suggesting that compounds which are actively 
synthesized and/or secreted inside the cell are part of this mechanism. Exported 
proteins are probably involved in this mechanism. Despite major health 
problems linked to these pathogenic organisms, little is known on their exported 
and/or secreted proteins. SDS-PAGE analyses of M. tuberculosis culture filtrate 
show at least 30 secreted proteins (Altschul et al., 1990; Nagal et al., 1991 and 
Young et al., 1992). Some of them have been characterized, their genes cloned 
and sequenced (Borremans et al., 1989; Wiker et al., 1992 and Yamaguchi et al., 
1989). Others, although being immunodominant antigens of major importance 
for inducing a protective immunity (Anderson et al., 1991 and Ornie et al., 1993), 
have not been completely identified. In addition, it is probable that many 
exported proteins remain attached to the cell membrane and are consequently 
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not present in the culture supematants. It has been shown that the proteins 

located at the outer surface of various pathogenic bacteria, such as the 103 kDa 

invasin of Yersina Pseudotuberculosis (Isberg et al., 1987) or the 80 kDa 

internalin of Listeria monocytogenes (Gaillard et al., 1991 and Dramsi et al., 

1997) play an important role in the interactions with the host cells and, 

consequently, in the pathogenicity as well as in the induction of protective 

responses. Thus, a protein which is bound to the membrane would be important 

for the M tuberculosis infection as well as for the induction of a protective 

response against this infection. These proteins could certainly be of interest for 

the preparation of vaccines. 

[017] Recently, the adaptation, to mycobacteria, of a genetic 
methodology for the identification and the phenotypic selection of export proteins 
has been described (Lim et al., 1995). This method uses E. co// periplasmic 
alkaline phosphatase (PhoA). A plasmid vector was constructed which allows 
the fusion of genes between a truncated PhoA gene and genes encoding 
exported proteins (Manoil et al., 1990). 

[018] Using this method, it has been possible to identify an M. 
tuberculosis gene {erp (Berthet et al., 1995)) exhibiting homologies with a 28 kDa 
exported protein of M, leprae, which is a frequent target of humoral responses of 
the lepromatous form of leprosy. A protein having amino acid motifs which are 
characteristic of plant desaturase {des) has also been characterized by the 
technique of fusion with PhoA. 
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[019] However, this genetic method for identifying exported proteins 

does not make it possible to easily evaluate the intracellular expression of the 

corresponding genes. Such an evaluation is of crucial importance both for 

selecting good candidate vaccines and for understanding the interactions 

between bacteria and their host cells. The induction of the expression of 

virulence factor through pathogenic target cell contact has been described. It is 

the case, for example, for the Yersinia pseudotuberculosis Yops virulence factors 

(Petersson et al., 1996). Shigella, upon contact with the target cells, releases the 

Ipa proteins into the culture medium, and Salmonella synthesizes novel surface 

structures. 

[020] Taking into account the preceding text, a great need currently 
exists for developing novel vaccines against pathogenic microbacteria as well as 
novel specific, reliable and rapid diagnostic tests. These developments require 
the designing of even more efficient specific tools which make it possible, on the 
one hand, to isolate or to obtain sequences of novel specific, in particular 
immunogenic, polypeptides, and, on the other hand, to better understand the 
mechanism of the interactions between bacteria and their host cells such as in 
particular the induction of the expression of virulence factor. This is precisely the 
object of the present invention. 

[021] The inventors have defined and produced, for this purpose, novel 
vectors allowing the screening, cloning and/or expression of mycobacterial DMA 
sequences so as to identify, among these sequences, nucleic acids encoding 
proteins of interest, preferably exported proteins, which may be located on the 
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bacterial membrane, and/or secreted proteins, and to identify among these 

sequences those which are induced or repressed during infection (intracellular 

growth). 

Description 

[022] The present invention describes the use of the reporter gene 
phoA in mycobacteria. It makes it possible to identify systems for expression and 
export in a mycobacterial context. Many genes are only expressed in such a 
context, which shows the advantage of the present invention. During the cloning 
of DMA segments of strains of the M. tuberculosis complex fused with phoA into 
another mycobacterium such as M. smegmatis, the beginning of the gene, its 
regulatory regions and its regulator will be cloned, which will make it possible to 
observe a regulation. If this regulation is positive, the cloning of the regulator will 
constitute an advantage for observing the expression and the export, 

[023] In the context of the invention, mycobacterium is understood to 
mean all the mycobacteria belonging to the various species listed by Wayne L.G. 
and Kubica G.P, (1980). Family Mycobacteriaceae in Bergey's manual of 
systematic bacteriology, J. P. Butler Ed. (Baltimore USA: Williams and Wilkins P. 
1436-1457). 

[024] In some cases, the cloned genes are subjected in their original 
host to a negative regulation which makes the observation of the expression and 
of the export difficult in the original host. In this case, the cloning of the gene in 
the absence of its negative regulator, into a host not containing it, will constitute 
an advantage. 
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[025] The invention also relates to novel mycobacterial polypeptides 
and to novel mycobacterial polynucleotides which may have been isolated by 
means of the preceding vectors and which are capable of entering into the 
preparation of compositions for the detection of a microbacterial infection, or for 
the protection against an infection caused by mycobacteria or for the search for 
inhibitors as is described above for DP428. 

[026] The subject of the invention is therefore a recombinant screening, 
cloning and/or expression vector, characterized in that it replicates in 
mycobacteria and in that it contains: 



[027] 


1) 


a replicon which is functional in mycobacteria; 


[028] 


2) 


a selectable marker; 


[029] 


3) 


a reporter cassette comprising: 


[030] 




a) a multiple cloning site (polylinker), 


[031] 




b) optionally a transcription terminator which is active in 



mycobacteria, upstream of the polylinker, 

[032] c) a coding nucleotide sequence which is derived from a 

gene encoding a protein expression, export and/or secretion marker, said 
nucleotide sequence lacking its initiation codon and its regulatory sequences, 
and 

[033] d) a coding nucleotide sequence derived from a gene 

encoding a marker for the activity of promoters which are contained in the same 
fragment, said nucleotide sequence lacking its initiation codon. Optionally, the 
recombinant vector also contains a replicon which is functional in E. colL 
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[034] Preferably, the export and/or secretion marker is placed in the 
same orientation as the promoter activity marker. 

[035] Preferably, the recombinant screening vector according to the 
invention comprises, in addition, a transcription terminator placed downstream of 
the promoter activity marker, which is likely to allow the production of short 
transcripts which are found to be more stable and which consequently allow a 
higher level of expression of the products of translation. 

[036] The export and/or secretion marker is a nucleotide sequence 
whose expression, followed by export and/or secretion, depends on the 
regulatory elements which control its expression. 

[037] "Sequences or elements for regulating the expression of the 
production of polypeptides and its location" is understood to mean a 
transcriptional promoter sequence, a sequence comprising the ribosome-binding 
site (RBS), the sequences responsible for export and/or secretion such as the 
sequence tenned signal sequence. 

[038] A first advantageous export and/or expression marker is a coding 
sequence derived from the phoA gene. Where appropriate, it is truncated such 
that the alkaline phosphatase activity is nevertheless capable of being restored 
when the truncated coding sequence is placed under the control of a promoter 
and of appropriate regulatory elements. 

[039] Other exposure, export and/or secretion markers may be used. 
There may be mentioned, by way of examples, a sequence of the gene for p- 
agarase, for the nuclease of a staphylococcus or for a p-lactamase. 
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[040] Among the advantageous markers for the activity of promoters 
which are contained in the same fragment, a coding sequence derived from the 
firefly luciferase /ucgene, provided with its initiation codon, is preferred. 

[041] Other markers for the activity of promoters which are contained in 
the same fragment may be used. There may be mentioned, by way of examples, 
a sequence of the gene for GFP (Green Fluorescent Protein). 

[042] The transcription terminator should be functional in mycobacteria. 
An advantageous tenninator is, in this regard, the T4 coliphage terminator (tT4). 
Other appropriate terminators for carrying out the invention may be isolated using 
the technique presented in the examples, for example by means of an "omega" 
cassette (Prentki et al., 1984). 

[043] A vector which is particularly preferred for carrying out the 
invention is a plasmid chosen from the following plasmids which have been 
deposited at the CNCM (Collection Nationale de Cultures de Microorganismes. 
25 rue de Docteur Roux, 75724 Paris cedex 15, France): 

[044] a) pJVEDa which was deposited at the CNCM under the No. I- 
1797. on 12/12/1996, 

[045] b) pJVEDb which was deposited at the CNCM under the No. I- 
1906, on 25 July 1997. 

[046] c) pJVEDc which was deposited at the CNCM under the No. I- 
1799, on 12/12/1996. 

[047] For the selection or the identification of mycobacterial nucleic acid 
sequences encoding polypeptides which are capable of being incorporated into 
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immunogenic or antigenic compositions for the detection of an infection, or which 

are capable of inducing or repressing a mycobacterial virulence factor, the vector 

of the invention will comprise, at one of the multiple cloning sites of the polylinker, 

a nucleotide sequence of a mycobacterium in which the detection is carried out 

of the presence of sequences corresponding to exported and/or secreted 

polypeptides which may be induced or repressed during the infection, or 

alternatively expressed or produced constitutively, their associated promoter 

and/or regulatory sequences which are capable of allowing or promoting the 

export and/or the secretion of said polypeptides of interest, or all or part of the 

genes of interest encoding said polypeptides. 

[048] Preferably, this sequence is obtained by physical fragmentation or 
by enzymatic digestion of the genomic DMA or of the DNA which is 
complementary to an RNA of a mycobacterium, preferably M. tuberculosis or 
chosen from M. africanum, M. bovis, M. avium or M, leprae. 

[049] The vectors of the invention may indeed also be used to 
detemriine the presence of sequences of interest, preferably corresponding to 
exported and/or secreted proteins, and/or capable of being induced or repressed 
or produced constitutively during the infection, in particular during phagocytosis 
by the macrophages, and, according to what was previously disclosed, in 
mycobacteria such as M. africanum, M. bovis, M. avium or M. teprae whose DNA 
or cDNA will have been treated by physical fragmentation or with defined 
enzymes. 
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[050] According to a first embodiment of the invention, the enzymatic 

digesion of the genomic DNA or of the complementary DNA is carried out using 

M. tuberculosis. 

[051] Preferably, this DNA is digested with an enzyme such as Sau3A. 
Bell or Bglll. 

[052] Other digestive enzymes such as Seal, Apal, Sacll or Kpnl or 
alternatively nucleases or polymerases can naturally be used as long as they 
allow the production of fragments whose ends can be inserted into one of the 
cloning sites of the polylinker of the vector of the invention. 

[053] Where appropriate, the digestions with various enzymes will be 
carried out simultaneously. 

[054] Recombinant vectors which are preferred for carrying out the 
invention are chosen from the following recombinant vectors which have been 
deposited at the CNCM: 

[055] a) p6D7 which was deposited on 28 January 1 997 at the 
CNCM under the No, 1-1814, 

[056] b) p5A3 which was deposited on 28 January 1 997 at the 
CNCM under the No. 1-1815, 

[057] c) p5F6 which was deposited on 28 January 1 997 at the CNCM 
under the No. 1-1816, 

[058] d) p2A29 which was deposited on 28 January 1 997 at the 
CNCM under the No. 1-1 81 7, 
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[059] e) pDP428 which was deposited on 28 January 1 997 at the 
CNCM under the No. 1-1818, 

[060] f) p5B5 which was deposited on 28 January 1 997 at the 
CNCM underthe No. 1-1819, 

[061] g) p1C7 which was deposited on 28 January 1997 at the 
CNCM underthe No. 1-1820. 

[062] h) p2D7 which was deposited on 28 January 1 997 at the 
CNCM underthe No. 1-1821, 

[063] i) pi B7 which was deposited on 31 January 1 997 at the 
CNCM under the No. 1-1 843, 

[064] j) pJVED/M tuberculosis which was deposited on 25 July 1997 
at the CNCM underthe No. 1-1907, 

[065] k) pM1 C25 which was deposited on 4 August 1 998 at the 
CNCM under the No. 1-2062. 

[066] Among those which are most preferred, the recombinant vector 
pDP428 which was deposited on 28 January 1 997 at the CNCM under the No. I- 
1818, and the vector pM1C25 which was deposited on 4 August 1998 at the 
CNCM under the No. 1-2062 are preferred. 

[067] The subject of the invention is also a method of screening 
nucleotide sequences derived from mycobacteria in order to detemnine the 
presence of sequences corresponding to exported and/or secreted polypeptides 
which may be induced or repressed during the infection, their associated 
promoter and/or regulatory sequences which are capable in particular of allowing 
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or promoting the export and/or secretion of said polypeptides of interest, or all or 

part of genes of interest encoding said polypeptides, characterized in that it uses 

a recombinant vector according to the invention. 

[068] The invention also relates to a method of screening, according to 
the invention, characterized in that it comprises the following steps: 

[069] a) physical fragmentation of the mycobacterial DNA sequences or 
their digestion with at least one defined enzyme and recovery of the fragments 
obtained; 

[070] b) insertion of the fragments obtained in step a) into a cloning site, 
which is compatible, where appropriate, with the enzyme of step a), of the 
polylinker of a vector according to the invention; 

[071] c) if necessary, amplification of said fragments contained in the 
vector, for example by replication of the latter after insertion of the vector thus 
modified into a defined cell, preferably E. coir, 

[072] d) transformation of the host cells with the vector amplified in step 
c), or in the absence of amplification, with the vector of step b); 

[073] e) culture of the transformed host cells in a medium allowing the 
detection of the export and/or secretion marker, and/or of the promoter activity 
marker which is contained in the vector; 

[074] f) detection of the host cells which are positive (positive colonies) 
for the expression of the export and/or secretion marker, and/or of the promoter 
activity marker; 
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[075] g) isolation of the DNA from the positive colonies and insertion of 
this DNA into a cell which is identical to that in step c); 

[076] h) selection of the inserts contained in the vector, allowing the 
production of clones which are positive for the export and/or secretion marker, 
and/or for the promoter activity marker; 

[077] i) isolation and characterization of the mycobacterial DNA 
fragments contained in these inserts. 

[078] In one of the preferred embodiments of the screening method 
according to the invention, the host cells, detected in step f), which are positive 
for the export and/or secretion marker are, optionally in a second stage, tested 
for the capacity of the selected nucleotide insert to stimulate the expression of 
the promoter activity marker when said host cells are phagocytosed by 
macrophage-type cells. 

[079] More specifically, the stimulation of the expression of the promoter 
activity marker in host cells placed in axenic culture (host cells alone in culture) is 
compared with the stimulation of the expression of the promoter activity marker in 
host cells cultured in the presence of macrophages and which are thus 
phagocytosed by the latter. 

[080] The selection of host cells which are positive for the promoter 
activity marker can be carried out immediately after step e) of the method of 
screening described above, or alternatively after any one of steps f), g), h) or i), 
that is to say once the host cells have been positively selected for the export 
and/or selection marker. 
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[081] The use of this method allows the construction of DNA libraries 

comprising sequences corresponding to polypeptides which are capable of being 

exported and/or secreted, and/or which are capable of being induced or 

repressed during the infection when they are produced inside recombinant 

mycobacteria. Step 1) of the method may comprise a step for sequencing the 

inserts selected. 

[082] Preferably, in the method according to the invention, the vector 
used is chosen from the plasmids pJVEDa (CNCM, No. 1-1797), pJVEDb 
(CNCM, No. 1-1906), pJVEDc (CNCM, No. 1-1799) or pJVED/M, tuberculosis 
(CNCM, No. 1-1907), and the digestion of the mycobacterial DNA sequences is 
carried out by means of the enzyme Sau3A. 

[083] According to a preferred embodiment of the invention, the method 
of screening is characterized in that the mycobacterial sequences are derived 
from a pathogenic mycobacterium, for example from M. tuberculosis, M. bovis, 
M, avium, M. africanum or M. leprae. 

[084] The invention also comprises a library of genomic DNA or of 
cDNA which is complementary to mycobacterial mRNA, characterized in that it is 
obtained by a method comprising steps a) and b) or a), b) and c) of the preceding 
method according to the invention, preferably a library of genomic DNA or of 
cDNA which is complementary to mRNA of pathogenic mycobacteria, preferably 
of mycobacteria belonging to the Mycobacterium tuberculosis complex group, 
preferably of Mycobacterium tuberculosis. 
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[085] In the present invention, "nucleic sequences" or "amino acid 
sequences" are understood to designate SEQ ID No, X to SEQ ID No. Y, where 
X and Y may independently represent a number or an alphanumeric character, 
respectively the set of nucleic sequences or jthe set of amino acid sequences 
represented by figures X to Y, ends included. 

[086] For example, the nucleic sequences or the amino acid sequences 
SEQ ID NOS: 1-87 are respectively the nucleic sequences or the amino acid 
sequences represented by figures 1 to 4N. 

[087] The subject of the invention is also the nucleotide sequences of 
mycobacteria or comprising nucleotide sequences of mycobacteria selected after 
carrying out the method according to the invention which is described above. 

[088] Preferably, said mycobacterium is chosen from M. tuberculosis, 
M. bovis, M. africanum, M. avium, M. leprae, M. paratuberculosis, M, kansassi or 
M xenopL 

[089] The nucleotide sequences of mycobacteria or comprising a 
mycobacterial nucleotide sequence are preferred, said mycobacterial nucleotide 
sequence being chosen from the sequences of mycobacterial DNA fragments 
having the nucleic sequences SEQ ID NOS: 1 . 8, 14, 25, 31 , 33, 35, 41 , 46, 52, 
56, 62, 64, 67, 69, 72. 74, 76, 78, 81. 84, 86, 88, 90, 92. 96, 98, 100. 104. 106. 
108, 110, 113. 119. 122, 128, 133. 137, 139, 141. 143. 145. 148. 150, 152, 154, 
156, 158, 160, 162, 165, 169, 177, 184, 189, 195, 200, 202, 206, 209. 211, 213, 
217, 220, 225, 228, 238. 246, 250, 255, 258. 260. 262. 268. 274. 278, 280, 282, 
284, 286. 288. 290, 297. 310, 317. 321. 323, 325, 327, 331, 333, 335. 337. 339. 
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346, 347, 353, 357, 359, 361, 364, 368. 371, 374, 380, 383, 385, 387, 389, 393, 
395, 397, 399, 403, 405, 407. 410, 412, 419. 421. 426. 429. 431. 433, 437, 441, 
447, 452. 456, 459. 461 , 463, 469. 472. 474. 476, 482, 485, 487, 489, 495, 497, 
501. 505. 510. 516. 519, 522, 530, 534. 537. 544. 546. 550. 552. 554. 556, 558. 
564. 569. 571. 573, 576, 580, 584, 586, 588, 590, 594, 596, 598, 600, 604, 608. 
610. 612. 614, 616, 618, 620, 622, 624, 626, 629, 631. 633. 635. 640, 647, 649, 
651, 653, 657, 660, 662, 664, 666, 669. 674, 676, 678, 683, 686, 691, 693, 695, 
697. 702. 717. 728. 733. 736. 739. 741. 743. 746. 752, 755, 757, 759. 761. 764, 
767. 769, 771, 784, 794. 805. 807. 809. 811. 813. 817, 821. 823. 825. 827, 831, 
833, 835, 837, 839, 842, 844, 846. 848. 864. 878, 883. 885. 887, 895. 901 , 907. 
and 909, which are represented respectively by Figures 1 to 24C (plates 1 to 
150), by Figures 27A to 27C (plates 152 to 154), by Figure 29 (plate 156) and by 
Figures 31 A to 50F (plates 158 to 275). 

[090] According to a specific embodiment of the invention, preferred 
sequences are, for example, the mycobacterial DMA fragments having the 
sequence SEQ ID NO: 1 . which is contained in the vector pDP428 (CNCM, 
No. 1-1818), SEQ ID NO: 41, which is contained in the vector p6D7 (CNCM, 
No. 1-1814), SEQ ID NOS: 88 and 96, which are contained in the vector p5F6 
(CNCM, No. 1-1816), SEQ ID NO: 110, which is contained in the vector p2A29 
(CNCM, No. 1-1817). SEQ ID NO: 122, which is contained in the vector p5B5 
(CNCM, No. 1-1819). SEQ ID NOS: 137 and 143. which are contained in the 
vector p1C7 (CNCM. No. 1-1820), SEQ ID NO: 158, which is contained in the 
vector p2D7 (CNCM, No. 1-1821), SEQ ID NO: 165, which is contained in the 
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vector p1B7 (CNCM, No. 1-1843), SEQ ID NO: 530, which is contained in the 

vector p5A3 (CNCM, No. 1-1815), or SEQ ID NO: 544, which is contained in the 

vector pM1C25 (CNCM, No. 1-2062. 

[091] The invention also relates to a nucleic acid comprising the entire 

open reading frame of one of the nucleotide sequences according to the 

invention, in particular one of the sequences SEQ ID NOS: 1, 8, 14, 25, 31, 33, 

35, 41 , 46, 52, 56, 62, 64, 67, 69, 72, 74, 76, 78, 81 , 84, 86, 88, 90, 92, 96, 98, 

100. 104, 106, 108, 110, 113, 119, 122, 128, 133, 137, 139, 141, 143, 145, 148, 

150, 152, 154, 156, 158, 160, 162, 165, 169, 177, 184, 189, 195, 200, 202, 206, 

209, 21 1, 213. 217, 220, 225, 228, 238, 246, 250, 255, 258, 260, 262, 268, 274, 

278, 280, 282, 284, 286, 288, 290, 297, 310, 317, 321 , 323, 325, 327, 331, 333, 

335, 337, 339. 346, 347, 353, 357, 359, 361 , 364, 368, 371, 374. 380, 383, 385. 

387. 389. 393, 395, 397, 399, 403, 405, 407, 410, 412, 419. 421. 426. 429. 431. 

433, 437, 441 . 447, 452, 456, 459, 461 , 463, 469, 472, 474, 476, 482, 485, 487. 

489, 495, 497. 501, 505, 510, 516, 519, 522, 530, 534, 537, 544, 546, 550, 552. 

554. 556. 558, 564, 569, 571, 573, 576, 580, 584, 586, 588, 590, 594, 596, 598. 

600, 604. 608. 610, 612, 614, 616, 618, 620, 622, 624, 626. 629, 631, 633, 635, 

640, 647, 649, 651, 653, 657, 660, 662, 664, 666, 669, 674, 676, 678, 683, 686, 

691 , 693, 695, 697, 702, 717, 728, 733, 736, 739, 741 , 743, 746. 752, 755, 757, 

759, 761, 764, 767, 769, 771, 784, 794, 805, 807, 809, 81 1, 813. 817, 821, 823, 

825, 827, 831, 833, 835, 837, 839, 842, 844, 846, 848, 864, 878. 883, 885, 887, 

895, 901 , 907, and 909 according to the invention. 



22 



PATENT 
Customer No. 22,852 
Attorney Docket No. 03715-0062-01 

[092] Said nucleic acid may be isolated, for example, in the following 

manner: 

[093] a) preparation of a cosmid library from the M. tuberculosis 
DNA, for example according to the technique described by Jacobs et al., 1 991 ; 

[094] b) hybridization of all or part of a probe nucleic acid having the 
sequences chosen, for example, from SEQ ID NOS: 1, 8, 14, 25, 31, 33, 35, 41, 
46, 52, 56, 62, 64, 67, 69, 72, 74. 76, 78, 81 , 84, 86, 88, 90, 92, 96, 98, 100, 104, 
106, 108, 110, 113, 119, 122, 128, 133, 137, 139, 141, 143, 145, 148, 150, 152, 
154, 156, 158, 160, 162, 165, 169, 177, 184, 189, 195, 200, 202, 206, 209, 211, 
213, 217, 220, 225, 228, 238, 246, 250, 255, 258, 260, 262, 268, 274, 278, 280, 
282, 284, 286, 288, 290, 297, 310, 317, 321, 323, 325, 327, 331, 333, 335, 337, 
339, 346, 347, 353, 357, 359, 361, 364, 368, 371, 374, 380, 383, 385, 387, 389, 
393, 395, 397, 399, 403, 405, 407, 410, 412, 419, 421, 426, 429, 431, 433, 437, 
441 , 447, 452, 456, 459, 461 , 463, 469, 472, 474, 476, 482, 485, 487, 489, 495, 
497, 501 , 505, 510, 516, 519, 522, 530, 534, 537, 544, 546, 550, 552, 554, 556, 
558, 564, 569, 571 , 573, 576, 580, 584, 586, 588, 590, 594, 596, 598, 600, 604, 
608, 610, 612, 614, 616, 618, 620, 622, 624, 626, 629, 631, 633, 635, 640, 647, 
649, 651 , 653, 657, 660, 662, 664, 666, 669, 674, 676, 678, 683, 686, 691 , 693, 
695, 697, 702, 717, 728, 733, 736, 739, 741 , 743, 746, 752, 755, 757, 759, 761 , 
764, 767, 769, 771, 784, 794, 805, 807, 809, 81 1, 813, 817, 821, 823, 825, 827, 
831 , 833, 835, 837, 839, 842, 844, 846, 848, 864, 878, 883, 885, 887, 895, 901 , 
907, 909, with the cosmlds of the library previously prepared In step a); 
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[095] c) selection of the cosmids hybridizing with the probe nucleic 

acid of step b); 

[096] d) sequencing of the DNA inserts of the clones selected in 
step c) and identification of the complete open reading frame; 

[097] e) where appropriate, cloning of the inserts sequenced in 
step d) into an appropriate expression and/or cloning vector. 

[098] The nucleic acids comprising the entire open reading frame of the 
sequences SEQ ID NOS: 1, 8, 14, 25, 31, 33, 35, 41 , 46, 52, 56, 62, 64, 67, 69, 
72, 74, 76, 78, 81, 84, 86, 88, 90, 92, 96, 98, 100, 104, 106, 108, 1 10, 1 13, 1 19, 
122, 128, 133, 137, 139, 141, 143, 145, 148, 150, 152, 154, 156, 158, 160, 162, 
165, 169, 177, 184, 189, 195, 200, 202, 206, 209, 211, 213, 217, 220, 225, 228, 
238, 246, 250, 255, 258, 260, 262, 268, 274, 278, 280, 282, 284, 286, 288, 290, 
297, 310, 317, 321 , 323, 325, 327, 331 , 333, 335, 337, 339, 346, 347, 353, 357, 
359, 361, 364, 368. 371, 374, 380, 383, 385, 387, 389, 393, 395, 397, 399, 403, 
405, 407, 41 0, 412, 419, 421 , 426, 429, 431 , 433, 437, 441 , 447, 452, 456, 459, 
461 , 463, 469, 472, 474, 476, 482, 485, 487, 489, 495, 497, 501, 505, 510, 516, 
519, 522, 530, 534, 537, 544, 546, 550, 552, 554, 556, 558, 564, 569, 571, 573, 
576, 580, 584, 586, 588, 590, 594, 596, 598, 600, 604, 608, 610, 612, 614, 616, 
618, 620, 622, 624, 626, 629, 631, 633, 635, 640, 647, 649, 651, 653, 657, 660, 
662, 664, 666, 669, 674, 676, 678, 683, 686, 691, 693, 695, 697, 702, 717, 728, 
733, 736, 739, 741, 743, 746, 752, 755, 757, 759, 761, 764, 767, 769, 771, 784, 
794, 805, 807, 809, 811, 813, 817, 821, 823, 825, 827, 831, 833, 835, 837, 839, 
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842, 844, 846, 848, 864, 878, 883, 885, 887, 895, 901 , 907, 909, are among the 

preferred nucleic acids. 

[099] The present invention makes it possible to determine a gene 



Comparison with the genome 
1998, Nature, 393, 537-544) 



fragment encoding an exported polypeptide, 
sequence published by Cole et al. (Cole et a 
makes it possible to determine the whole gene carrying the identified sequence 
according to the present invention. 

[01 00] Nucleotide sequence comprising the entire open reading frame of 
a sequence according to the invention is understood to mean the nucleotide 
sequence (genomic, cDNA, semisynthetic or synthetic) comprising one of the 
sequences according to the invention and extending, on the one hand, in 5' of 
these sequences up to the first codon for initiation of translation (ATG or GTG) or 
even up to the first stop codon, and, on the other hand, in 3' of these sequences 
up to the next stop codon, this being in any one of the three possible reading 
frames. 

[01 01 ] The nucleotide sequences which are complementary to the above 
sequences according to the invention also form part of the invention. 

[01 02] Polynucleotide having a sequence which is complementary to a 
nucleotide sequence according to the invention is understood to mean any DNA 
or RNA sequence whose nucleotides are complementary to those of said 
sequence according to the invention and whose orientation is reversed. 
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[0103] The nucleotide fragments of the above sequences according to 

the invention, which are in particular useful as probes or primers, also forni part 

of the invention. 

[0104] The invention also relates to the polynucleotides, characterized in 
that they comprise a polynucleotide chosen from: 

[0105] a) a polynucleotide whose sequence is complementary to the 
sequence of a polynucleotide according to the invention, 

[01 06] b) a polynucleotide whose sequence comprises at least 50% 
identity with a polynucleotide according to the invention, 

[0107] c) a polynucleotide which hybridizes, under high stringency 
conditions, with a polynucleotide sequence according to the invention, 

[01 08] d) a fragment of at least 8 consecutive nucleotides of a 
polynucleotide defined according to the invention. 

[0109] The high stringency conditions as well as the percentage identity 
will be defined below in the present description. 

[01 10] When the coding sequence derived from the export and/or 
secretion marker gene is a sequence derived from the phoA gene, the export 
and/or secretion of the product of the phoA gene, truncated where appropriate, is 
obtained only when this sequence is inserted in phase with the sequence or 
element for regulating the expression of the production of polynucleotides and its 
location placed upstream, which contains the elements controlling the 
expression, export and/or secretion which are derived from a mycobacterial 
sequence. 
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[Oil 1] The recombinant vectors of the invention may of course comprise 

multiple cloning sites which are shifted by one or two nucleotides relative to a 

vector according to the invention, thus making it possible to express the 

polypeptide corresponding to the mycobacterial DMA fragment which is inserted 

and which is capable of being translated according to one of the three possible 

reading frames. 

[01 12] For example, the preferred vectors pJVEDb and pJVEDc of the 
invention are distinguishable from the preferred vector pJVEDa by a respective 
shift of one and two nucleotides at the level of the multiple cloning site. 

[01 13] Thus, the vectors of the invention are capable of expressing each 
of the polypeptides which are capable of being encoded by an inserted 
mycobacterial DNA fragment. Said polypeptides, characterized in that they are 
therefore capable of being exported and/or secreted, and/or induced or 
repressed, or expressed constitutively during the infection, form part of the 
invention. 

[01 14] The polypeptides of the invention whose amino acid sequences 
are chosen from the amino acid sequences SEQ ID NOS 2-7, 9-13, 15-24, 26- 
30, 32, 34. 36-40, 42-45, 47-51. 53-55, 57-61, 63, 65-66, 68. 70-71, 73, 75. 77, 
79-80, 82-83, 85. 87, 89. 91. 93-95, 97, 99. 101-103, 105, 107, 109. 111-112, 
114-118, 120-121. 123-127. 129-132, 134-136. 138, 272-273. 140, 142, 144, 
146-147, 149, 151, 153, 155, 157, 159, 161, 163-164. 166-168, 170-176, 178- 
183, 185-188, 190-194. 196-199, 201, 203-205, 207-208, 210, 212, 214-216, 
218-219, 221-224, 226-227, 923-925, 229-237, 239-245, 247-249. 251-254, 256- 
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257, 259, 261 , 263-267, 269-271 , 275-277, 279, 281, 283, 285, 287, 289, 291- 

296, 298-309, 31 1-316, 318-320, 322, 324, 326, 328-330, 332, 334, 336, 338, 

340-345, 348-352, 354-356, 358, 360, 926-930, 362-363, 365-367, 369-370, 

372-373, 375-379, 381-382, 384, 386, 388, 390-392, 394, 396, 398, 400-402, 

404, 406, 408-409, 41 1 , 41 3-41 8, 420, 422-425. 427-428. 430, 432, 434-436, 

438-440, 442-446, 448-451 , 453-455, 457-458, 460, 462, 464-468, 470-471 . 473, 

475, 477-481, 483-484, 486, 488, 490-494. 496, 498-500. 502-504. 506-509, 

511-515, 517-518, 520-521 , 523-527, 531-533, 535-536, 538-542, 543, 545, 

547-549, 551, 553, 555, 557, 559-563, 565-568,570, 572, 574-575, 577-579, 

581-583, 585, 587, 589, 591-593, 595, 597. 599, 601-603, 605-607, 609, 61 1 , 

613, 615, 617, 619, 621, 623, 625, 627-628, 630, 632, 634, 636-639, 641-646, 

648, 650, 652, 654-656, 658-659, 661, 663, 665, 931-933, 667-668. 670-673, 

675, 677, 679-682, 684-685, 687-690, 692, 694. 696, 698-701, 703-716, 718- 

727, 729-732, 734-735, 737-738, 740, 742, 744-745, 747-751. 753-754, 756. 

758, 760, 762-763, 765-766, 768, 770, 772-783, 785-793, 795-804, 806, 808, 

810, 812, 814-816, 818-820, 822, 824, 826, 828-830. 832. 834. 836. 838, 840- 

841, 843, 845, 847, 849-863, 865-877, 879-882, 884. 886, 888-894. 896-900, 

902-906. 908, 910, and represented respectively by Figures 1 to 24C (plates 1 to 

150), Figures 27A to 28 (plates 152 to 155) and Figures 30 to 50F (plates 157 to 

275) are in particular preferred. 

[01 15] Also fonning part of the invention are the fragments or biologically 

active fragments as well as the polypeptides which are homologous to said 
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polypeptides; fragment, biologically active fragment and polypeptides which are 

homologous to a polypeptide being as defined below in the description. 

[01 16] The invention also relates to the polypeptides comprising a 
polypeptide or one of their fragments according to the invention. 

[01 17] The subject of the invention is also recombinant mycobacteria 
containing a recombinant vector according to the invention which is described 
above. A preferred mycobacterium is a mycobacterium of the M. smegmatis 
type. 

[01 1 8] M. smegmatis advantageously makes it possible to test the 
efficiency of mycobacterial sequences for controlling the expression, export 
and/or secretion, and/or promoter activity of a given sequence, for example of a 
sequence encoding a marker such as alkaline phosphatase and/or luciferase. 

[01 1 9] Another preferred mycobacterium is a mycobacterium of the 
M. bovis type, for example the BCG strain which is currently used for vaccination 
against tuberculosis. 

[0120] Another preferred mycobacterium is a strain of M. tuberculosis, 
M. bovis or M. africanum potentially possessing all the appropriate regulatory 
systems. 

[0121] The inventors have thus characterized, in particular, a 
polynucleotide consisting of a nucleotide sequence which is present in all the 
tested strains of mycobacteria belonging to the Mycobacterium tuberculosis 
complex. This polynucleotide, called DP428, contains an open reading frame 
(ORF) encoding a polypeptide of about 12 kD. The open reading frame (ORF) 
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encoding the polypeptide DP428 extends from the nucleotide at position nt 451 

to the nucleotide at position nt 861 of the sequence SEQ ID NO: 35, the 

polypeptide DP428 having the following amino acid sequences SEQ ID NOS: 39 

&543: 

MKTGTATTRRRLLAVLIALALPGAAVALLAEPSATGASDPCAASEVARTVGSVA 
KSMGDYLDSHPETNQVMTAVLQQQVGPGSVASLKAHFEANPKVASDLHALSQ 
PLTDLSTRCSLPISGLQAIGLMQAVQGARR. 

[0122] This molecular weight (MW) corresponds to the theoretical MW of 
the mature protein obtained after cleavage of the signal sequence, the MW of the 
protein or polypeptide DP428 being about 10 kD after potential anchorage to 
peptidoglycan and potential cleavage between S and G of the LPISG motif. 

[0123] This polynucleotide includes, on the one hand, an open reading 
frame corresponding to a structural gene and, on the other hand, the signals for 
regulating the expression of the coding sequence upstream and downstream of 
the latter. The polypeptide DP428 is composed of a signal peptide, a hydrophilic 
central region and a hydrophobic C-terminal region. The latter ends with two 
arginine residues (R), a retention signal, and is preceded by an LPISG motif 
which resembles the LPXTG motif for anchorage to peptidoglycan 
(Schneewind et al., 1995). 

[0124] Structural gene for the purposes of the present invention is 
understood to mean a polynucleotide encoding a protein, a polypeptide or 
altematively a fragment of the latter, said polynucleotide comprising only the 
sequence corresponding to the open reading frame (ORF), which excludes the 

30 



PATENT 
Customer No. 22,852 
Attorney Docket No. 03715-0062-01 

sequences on the 5' side of the open reading frame (ORF) which direct the 

initiation of transcription. 

[0125] Thus, the invention relates in particular to a polynucleotide whose 

sequence is chosen from the nucleotide sequences SEQ ID NOS: 1 , 8, 14, 25, 

31, 33, and 35. 

[0126] More particularly, the invention relates to a polynucleotide, 
characterized in that it comprises a polynucleotide chosen from: 

[0127] a) a polynucleotide whose sequence is chosen from the 
nucleotide sequences SEQ ID NOS: 1,8, 14, 25, 31, 33, and 35, 

[0128] b) a polynucleotide whose nucleic sequence is the sequence 
between the nucleotide at position nt 964 and the nucleotide at position nt 1234, 
ends included, of the sequence SEQ ID NOS: 1 , 8, 14, 25, 31 , and 33, 

[01 29] c) a polynucleotide whose sequence is complementary to the 
sequence of a polynucleotide defined in a) or b), 

[01 30] d) a polynucleotide whose sequence exhibits at least 50% 
identity with a polynucleotide defined in a), b) or c), 

[0131] e) a polynucleotide which hybridizes, under high stringency 
conditions, with a sequence of a polynucleotide defined in a), b), c) or d), 

[01 32] f) a fragment of at least 8 consecutive nucleotides of a 
polynucleotide defined in a), b), c), d) or e). 

[0133] Nucleotide sequence, polynucleotide or nucleic acid is understood 
to mean, according to the present invention, a double-stranded DNA, a single- 
stranded DNA and products of transcription of said DNAs. 
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[0134] Percentage identity for the purpose of the present invention is 
understood to mean a percentage identity between the bases of two 
polynucleotides, this percentage being purely statistical and the differences 
between the two polynucleotides being distributed randomly and over their entire 
length. ' 

[0135] Hybridization under high stringency conditions means that the 
temperature and ionic strength conditions are chosen such that they allow the 
hybridization between two complementary DNA fragments to be maintained. 

[0136] By way of illustration, high stringency conditions of the 
hybridization step for the purposes of defining the polynucleotide fragments 
described above are advantageously the following: 

[0137] the hybridization is carried out at a temperature which is 
preferably 65''C, in the presence of buffer marketed under the name rapid-hyb 
buffer by Amersham (RPN 1636) and 100 [ig/ml of E. co// DNA. 

[0138] The washing steps may, for example, be the following: 

[0139] - two washes of 1 0 min, preferably at 65''C, in a 2 x SSC buffer 
and 0.1% SDS; 

[0140] - two washes of 10 min, preferably at 65°C, in a 1 x SSC buffer 
and 0.1% SDS; 

[0141] - one wash of 10 min, preferably at 65°C, in a 0.1 x SSC buffer 
and 0.1% SDS. 
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[0142] - 1 X SSC corresponds to 0.1 5 M NaCI and 0.05 M Na citrate and 
a 1 X Denhardt solution corresponds to 0.02% Ficoll, 0.02% of 
polyvinylpyrrolidone and 0.02% of bovine serum albumin. 

[01 43] Advantageously, a nucleotide fragment corresponding to the 
preceding definition will have at least 8 nucleotides, preferably at least 
12 nucleotides, and still more preferably at least 20 consecutive nucleotides of 
the sequence from which it is derived. The high stringency hybridization 
conditions described above for a polynucleotide having a size of about 200 bases 
will be adjusted by persons skilled in the art for oligonucleotides with a larger or a 
smaller size, according to the teaching of Sambrook et al., 1989. 

[0144] For the conditions for using the restriction enzymes with the aim of 
obtaining nucleotide fragments of the polynucleotides according to the invention, 
reference will be advantageously made to the manual by Sambrook et aL, 1989. 

[0145] Advantageously, a polynucleotide of the invention will contain at 
least one sequence comprising the stretch of nucleotides going from the 
nucleotide at position nt 964 to the nucleotide nt 1234 of the polynucleotide 
having the sequence SEQ ID NOS 1 , 8, 14, 25, 31 , and 33. 

[0146] The subject of the present invention is a polynucleotide according 
to the invention, characterized in that its nucleic sequence hybridizes with the 
DNA of a sequence of mycobacteria and preferably with the DNA of a sequence 
of mycobacteria belonging to the Mycobacterium tuberculosis complex. 

[0147] The polynucleotide is encoded by a polynucleotide sequence as 
described supra. 
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[0148] The subject of the present invention is also a polypeptide derived 
from a mycobacterlum, characterized in that it is present only in the mycobacteria 
belonging to the Mycobacterium tuberculosis complex. 

[0149] The invention also relates to a polypeptide characterized in that it 
comprises a polypeptide chosen from: 

[0150] . a) a polypeptide whose amino acid sequence is included in an 
amino acid sequence chosen from the amino acid sequences SEQ ID NOS 2-7, 
9-13, 15-24, 26-30, 32, 34, 36-40, 42-45, 47-51 , 53-55, 57-61 , 63, 65-66, 68, 70- 
71, 73, 75. 77, 79-80, 82-83. 85, 87, 89, 91, 93-95, 97. 99. 101-103. 105, 107, 
109, 111-112, 114-118. 120-121. 123-127, 129-132, 134-136, 138, 272-273, 140, 
142, 144, 146-147, 149, 151, 153, 155, 157, 159, 161, 163-164, 166-168, 170- 
176, 178-183, 185-188, 190-194, 196-199, 201, 203-205, 207-208. 210. 212. 
214-216, 218-219, 221-224, 226-227, 923-925, 229-237, 239-245, 247-249, 251- 
254, 256-257, 259, 261, 263-267, 269-271 , 275-277, 279. 281. 283. 285, 287, 
289, 291-296, 298-309, 311-316, 318-320, 322, 324, 326. 328-330, 332, 334, 
336, 338, 340-345, 348-352, 354-356, 358, 360, 926-930, 362-363, 365-367. 
369-370, 372-373, 375-379. 381-382, 384, 386, 388, 390-392. 394. 396. 398, 
400-402, 404, 406, 408-409, 411, 413-418, 420, 422-425, 427-428, 430, 432, 
434-436, 438-440, 442-446, 448-451 , 453-455, 457-458, 460, 462. 464-468, 
470-471, 473, 475, 477-481, 483-484, 486, 488, 490-494, 496. 498-500, 502- 
504, 506-509, 51 1-515, 517-518, 520-521, 523-527, 531-533, 535-536, 538-542. 
543, 545, 547-549. 551. 553. 555, 557, 559-563, 565-568. 570. 572. 574-575, 
577-579, 581-583, 585, 587, 589. 591-593, 595, 597, 599, 601-603, 605-607, 
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609, 611, 613, 615, 617, 619. 621, 623, 625, 627-628, 630, 632, 634, 636-639, 

641-646, 648, 650, 652, 654-656, 658-659, 661, 663, 665, 931-933, 667-668, 

670-673, 675, 677, 679-682, 684-685, 687-690, 692, 694, 696, 698-701, 703- 

716, 718-727, 729-732, 734-735, 737-738, 740, 742, 744-745, 747-751, 753-754, 

756, 758, 760, 762-763, 765-766, 768, 770, 772-783, 785-793, 795-804, 806, 

808, 810, 812, 814-816, 818-820, 822, 824, 826, 828-830, 832, 834, 836, 838, 

840-841, 843, 845, 847, 849-863, 865-877, 879-882, 884, 886, 888-894, 896- 

900, 902-906, 908, and 910^ 

[0151] b) a polypeptide which is homologous to the polypeptide 

defined in a), 

[01 52] c) a fragment of at least 5 amino acids of a polypeptide defined 
in a) orb), 

[01 53] d) a biologically active fragment of a polypeptide defined in a), 
b) or c). 

[0154] The subject of the present invention is also a polypeptide whose 
amino acid sequence is included in the amino acid sequences SEQ ID NOS: 
SEQ ID NOS: 2-7, 9-13, 15-24, 26-30, 32, 34, 36-40, or a polypeptide having the 
amino acid sequence SEQ ID NO: 543. 

[0155] Homologous polypeptide will be understood to designate the 
polypeptides exhibiting, relative to the natural polypeptide according to the 
invention such as the polypeptide DP428, certain modifications such as in 
particular a deletion, addition or substitution of at least one amino acid, a 
truncation, an extension, a chimeric fusion, and/or a mutation. Among the 
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homologous polypeptides, those whose amino acid sequence exhibits at least 

30%, preferably 50%, homology with the amino acid sequences of the 

polypeptides according to the invention are preferred. In the case of a 

substitution, one or more consecutive or nonconsecutive amino acids are 

replaced with "equivalent" amino acids. The expression "equivalent" amino acid 

is intended here to designate any amino acid capable of being substituted for one 

of the amino acids of the parent structure without, however, essentially modifying 

the immunogenic properties of the corresponding peptides. In other words, the 

equivalent amino acids will be those which allow the production of a polypeptide 

having a modified sequence which allows the induction in vivooi antibodies or of 

cells capable of recognizing the polypeptide whose amino acid sequence is 

included in the amino acid sequence of the polypeptide according to the 

invention, such as the amino acid sequences SEQ ID NOS: 2-7, 9-13, 15-24, 26- 

30, 32, 34, 36-40,or a polypeptide having the amino acid sequence SEQ ID NO: 

543 (polypeptide DP428) or one of its above-defined fragments. 

[01 56] These equivalent aminoacyls may be determined either based on 
their structural homology with the aminoacyls for which they are substituted, or 
on the results of cross-immunogenicity assays to which the different peptides are 
capable of giving rise. 

[0157] By way of example, there may be mentioned the possibilities of 
substitutions which are capable of being made without resulting in a profound 
modification of the immunogenicity of the corresponding modified peptides, the 
replacements, for example, of leucine with valine or isoleucine, of aspartic acid 
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with glutamic acid, of glutamine with asparagine and of arginine with lysine, and 

the like, it being possible to naturally envisage the reverse substitutions under the 

same conditions, 

[01 58] Biologically active fragment will be understood to designate in 
particular a fragment of an amino acid sequence of a polypeptide having at least 
one of the characteristics of the polypeptides according to the invention, in 
particular in that it is: 

[01 59] - capable of being exported and/or secreted by a mycobacterium, 
and/or of being induced or repressed during infection with the mycobacterium; 
and/or 

[0160] - capable of inducing, repressing or modulating, directly or 
indirectly, a mycobacterium virulence factor; and/or 

[0161] - capable of inducing an immunogenicity reaction directed against 
mycobacteria; and/or 

[0162] - capable of being recognized by an antibody which is specific for 
mycobacterium. 

[0163] Polypeptide fragment is understood to designate a polypeptide 
comprising a minimum of 5 amino acids, preferably 10 amino acids and 15 amino 
acids. 

[0164] A polypeptide of the invention, or one of its fragments, as defined 
above, is capable of being specifically recognized by the antibodies present in 
the serum of patients infected by mycobacteria and preferably bacteria belonging 
to the Mycobacterium tuberculosis complex or by cells of the infected host. 

37 



PATENT 
Customer No. 22,852 
Attorn yD cket No. 03715-0062-01 

[01 65] Thus, forming part of the invention are the fragments of the 
polypeptide whose amino acid sequence is included in the amino acid sequence 
of a polypeptide according to the invention, such as the amino acid sequences 
SEQ ID NOS: 2-7, 9-13, 15-24, 26-30, 32, 34, 36-40, or a polypeptide having an 
amino acid sequence SEQ ID NO: 543, which may be obtained by cleavage of 
said polypeptide with a proteolytic enzyme, such as trypsin or chymotrypsin or 
collagenase, or with a chemical reagent, such as cyanogen bromide (CNBr) or 
alternatively by placing a polypeptide according to the invention such as the 
polypeptide DP428 in a very acidic environment, for example at pH 2.5. 
Preferred peptide fragments according to the invention, for use in diagnosis or in 
vaccination, are the fragments contained in regions of a polypeptide according to 
the invention such as the polypeptide DP428 which are capable of being 
naturally exposed to the solvent and to thus exhibit substantial immunogenicity 
properties. Such peptide fragments may be prepared either by chemical 
synthesis, from hosts transfomried with an expression vector according to the 
invention containing a nucleic acid allowing the expression of said fragments, 
placed under the control of appropriate regulatory and/or expression elements or 
alternatively by chemical or enzymatic cleavage. 

[01 66] Analysis of the hydrophilicity of the polypeptide DP428 was 
carried out with the aid of the DMA Stridor™ software (marketed by CEA Saclay) 
on the basis of a calculation of the hydrophilic character of the region encoding 
DP428 of SEQ ID NO: 543. The results of this analysis are presented in 
Figure 54 where the hydrophilicity index is detailed, for each of the amino acids 
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(AA) having a defined position in SEQ ID NO: 543. The higher the hydrophilicity 

index, the more the amino acid considered is likely to be exposed to the solvent 

in the native molecule, and is subsequently likely to exhibit a high degree of 

antigenicity. Thus, a stretch of at least seven amino acids possessing a high 

hydrophilicity index (>0.3) can constitute the basis of the structure of an 

immunogenic candidate peptide according to the present invention. 

[0167] The cellular immune responses of the host to a polypeptide 
according to the invention can be demonstrated according to the techniques 
described by Colignon et al., 1996. 

[0168] From the data of the hydrophilicity map presented in Figure 54, 
the inventors were able to define regions of the polypeptide DP428 which are 
preferably exposed to the solvent, more particularly the region located between 
amino acids 55 and 72 of the sequence SEQ ID NO: 543 and the region located 
between amino acids 99 and 107 of SEQ ID NO: 543, 

[0169] The peptide regions of the polypeptide DP428 which are defined 
above may be advantageously used for the production of immunogenic 
compositions or of vaccine compositions according to the invention. 

[0170] The polynucleotides characterized in that they encode a 
polypeptide according to the invention also form part of the invention. 

[0171] The invention also relates to the nucleic acid sequences which 
can be used as probes or primers, characterized in that said sequences are 
chosen from the nucleic acid sequences of polynucleotides according to the 
invention. 
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[0172] The invention relates, in addition, to the use of a nucleic acid 

sequence of polynucleotides according to the invention as a probe or a primer for 

the detection and/or amplification of a nucleic acid sequence. Among these 

nucleic acid sequences according to the invention which can be used as probes 

or primers there are preferred the nucleic acid sequences of the invention, 

characterized in that said sequences are sequences, or their complementary 

sequence, between the nucleotide at position nt 964 and the nucleotide at 

position nt 1234, ends included, of the sequence SEQ ID NOS: 1, 8, 14, 25, 31, 

and 33. 

[0173] Among the polynucleotides according to the invention which can 
be used as nucleotide primers, the polynucleotides having the sequences 
SEQ ID NO: 528 and SEQ ID NO: 529 are particularly preferred. 

[0174] The polynucleotides according to the invention may thus be used 
to select nucleotide primers, in particular for the PGR technique (Eriich, 1989; 
Innis et a!., 1990, and, Rolfs et al., 1991). 

[0175] This technique requires the choice of oligonucleotide pairs 
flanking the fragment which has to be amplified. Reference may be made, for 
example, to the technique described in American patent US No, 4,683,202. 
These oligodeoxyribonucleotide or oligoribonucleotide primers advantageously 
have a length of at least 8 nucleotides, preferably of at least 12 nucleotides, and 
still more preferably of at least 20 nucleotides. Primers having a length of 
between 8 and 30 and preferably 12 and 22 nucleotides will be preferred in 
particular. One of the two primers is complementary to the {+) strand [fonward 
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primer] of the template and the other primer is complementary to the (-) strand 

[backward primer]. It is important that the primers do not possess a secondary 

structure or sequences which are complementary to each other. Moreover, the 

length and the sequence of each primer should be chosen so that the primers do 

not hybridize with other nucleic acids from prokaryotic or eukaryotic cells, in 

particular with the nucleic acids from other pathogenic mycobacteria, or with 

human DNA or RNA which may possibly contaminate the biological sample. 

[0176] The results presented in Figure 51 show that the sequence 
encoding the polypeptide DP428 (SEQ ID NO: 543) is not found in the DMAs of 
M. fortuitum, M, simiae, M. avium, M. chelonae, M, flavescens, M. gordonae, 
M. marinum and M. kansasii. 

[01 77] The amplified fragments may be identified after agarose or 
polyacrylamide gel electrophoresis or after capillary electrophoresis, or 
alternatively after a chromatographic technique (gel filtration, hydrophobic 
chromatography or ion-exchange chromatography). The specificity of the 
amplification may be checked by molecular hybridization using, as probes, the 
nucleotide sequences of polynucleotides of the invention, plasmids containing 
these sequences or their amplification products. 

[01 78] The amplified nucleotide fragments may be used as reagents in 
hybridization reactions in order to detect the presence, in a biological sample, of 
a target nucleic acid having a sequence which is complementary to that of said 
amplified nucleotide fragments. 



41 



PATENT 
Customer No. 22,852 
Att rn y Docket No. 03715-0062-01 

[01 79] Among the polynucleotides according to the invention which can 
be used as nucleotide probes, the polynucleotide fragment comprising the 
sequence between the nucleotide at position nt 964 and the nucleotide at 
position nt 1234, ends included, of the sequence SEQ ID NO: 1 is most 
particularly preferred. 

[0180] These probes and amplicons may be labeled or otherwise with 
radioactive elements or with nonradioactive molecules such as enzymes or 
fluorescent elements. 

[0181] The invention also relates to the nucleotide fragments which are 
capable of being obtained by amplification with the aid of primers according to 
the invention. 

[01 82] Other techniques for the amplification of the target nucleic acid 
may be advantageously used as alternatives to PGR. 

[0183] The SDA (Strand Displacement Amplification) technique (Walker 
et al., 1992) is an isothermic amplification technique whose principle is based on 
the capacity of a restriction enzyme to cut one of the two strands of its 
recognition site which is in the form of a hemiphosphorothioate and on the 
property of a DMA polymerase to initiate the synthesis of a new DMA strand from 
the 3'OH end created by the restriction enzyme and to displace the strand 
previously synthesized which is present downstream. 

[0184] The polynucleotides of the invention, in particular the primers 
according to the invention, may also be used in other methods of amplifying a 
target nucleic acid, such as: 
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[0185] - the TAS (Transcription-based Amplification System) technique 
described by Kwoh et al. in 1989; 

[0186] - the 3SR (Self-Sustained Sequence Replication) technique 
described by Guatelli et al. in 1990; 

[0187] - the NASBA (Nucleic Acid Sequence Based Amplification) 
technique described by Kievitis et al. in 199i ; 

[0188] - the TMA (Transcription Mediated Amplification) technique. 

[0189] The polynucleotides of the invention may also be used in 
techniques for the amplification or modification of the nucleic acid serving as 
probe, such as: 

[0190] - the LCR (Ligase Chain Reaction) technique described by 
Landegren et al. in 1988 and improved by Barany et al. in 1991 , which uses a 
heat-stable ligase; 

[0191] - the RCR (Repair Chain Reaction) technique described by Segev 
in 1992; 

[0192] - the CPR (Cycling Probe Reaction) technique described by 
Duck et al. in 1990; 

[0193] - the Q-beta-replicase amplification technique described by 
Miele et al. in 1983 and improved in particular by Chu et al. in 1986, Lizardi et al. 
in 1988 and then by Burg et al. as well as Stone et al. in 1996. 

[01 94] In the case where the target polynucleotide to be detected is an 
RNA, for example an mRNA, a reverse transcriptase-type enzyme will be 
advantageously used, prior to using an amplification reaction using the primers 
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according to the invention or to the use of a method of detection using the probes 

of the invention, in order to obtain a cDNA from the RNA contained in the 

biological sample. The cDNA obtained will then serve as target for the primers or 

probes used in the method of amplification or detection according to the 

invention. 

[01 95] The detection probe will be chosen so that it hybridizes with the 
amplicon generated. Such a detection probe will advantageously have a 
sequence of at least 12 nucleotides in particular of at least 15 nucleotides and 
preferably at least 200 nucleotides. 

[0196] The nucleotide probes according to the invention are capable of 
detecting mycobacteria and preferably bacteria belonging to the Mycobacterium 
tuberculosis complex, more particularly because of the fact that these 
mycobacteria possess in their genome at least one copy of polynucleotides 
according to the invention. These probes according to the invention are capable, 
for example, of hybridizing with the nucleotide sequence of a polypeptide 
according to the invention, more particularly any oligonucleotide hybridizing with 
the sequences SEQ ID NOS 1 , 8. 14, 25, 31 , and 33 encoding the 
M. tuberculosis polypeptide DP428 and not exhibiting a cross-hybridization 
reaction or an amplification reaction (PGR) with, for example, sequences present 
in mycobacteria not belonging to the Mycobacterium tuberculosis complex. The 
nucleotide probes according to the invention hybridize specifically with a DNA or 
RNA molecule of a polynucleotide according to the invention, under high 
stringency hybridization conditions as given in the fomri of an example above. 
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[01 97] The nonlabeled sequences may be used directly as probes. 
However, the sequences are generally labeled with a radioactive element f ^P, 
^^S, ^H, ^^^1) or with a nonradioactive molecule (biotin, acetylaminofluorene, 
digoxigenin, 5-bromodeoxyuridine, fluorescein) in order to obtain probes which 
can be used for many applications. 

[0198] Examples of nonradioactive labelings of probes are described, for 
example, in French patent No. 78,10975 or by Urdea et al. or by Sanchez- 
Pescador et al. in 1988. 

[0199] In the latter case, it will also be possible to use one of the labeling 
methods described in patents FR 2,422,956 and FR 2,518,755. The 
hybridization technique may be carried out in various ways (Matthews et al., 
1988). The most common method consists in immobilizing the nucleic acid 
extracted from mycobacterial cells onto a support (such as nitrocellulose, nylon, 
polystyrene) and in incubating, under well-defined conditions, the immobilized 
target nucleic acid with the probe. After hybridization, the excess probe is 
removed and the hybrid molecules formed are detected by the appropriate 
method (measurement of the radioactivity, of the fluorescence or of the 
enzymatic activity linked to the probe). 

[0200] Advantageously, the labeled nucleotide probes according to the 
invention may have a structure such that they make amplification of the 
radioactive or nonradioactive signal possible. An amplification system 
corresponding to the above definition will comprise detection probes in the fomn 
of a branched, ramified DMA such as those described by Urdea et al. in 1991 . 
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According to this technique, several types of probe, in particular a capture probe, 

to immobilize the target DNA or RNA to a support, and a detection probe will be 

advantageously used. The detection probe binds a "branched" DNA having a 

ramified structure. The branched DNA in turn is capable of binding 

oligonucleotide probes which are themselves coupled to alkaline phosphatase 

molecules. The activity of this enzyme is then detected using a 

chemiluminescent substrate, for example a derivative of dioxethane phosphate. 

[0201] According to another advantageous embodiment of the nucleic 
probes according to the invention, they can be covalently or noncovalently 
immobilized on a support and used as capture probes. In this case, a probe 
termed "capture probe" is immobilized on a support and serves to capture, 
through specific hybridization, the target nucleic acid obtained from the biological 
sample to be tested. If necessary, the solid support is separated from the 
sample and the duplex fomied between the capture probe and the target nucleic 
acid is then detected by means of a second probe termed "detection probe" 
which is labeled with an easily detectable element. 

[0202] The oligonucleotide fragments may be obtained from the 
sequences according to the invention by cleavage with restriction enzymes or by 
chemical synthesis according to conventional methods, for example according to 
the method described in European patent No. EP-0,305,929 (Millipore 
Corporation) or by other methods. 
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[0203] An appropriate method of preparing the nucleic acids of the 
invention comprising a maximum of 200 nucleotides (or 200 bp in the case of 
double-stranded nucleic acids) comprises the following steps: 

[0204] - synthesis of DMA using the automated beta- 
cyanethylphosphoramidite method described in 1 986, 

[0205] - cloning of the nucleic acids thus obtained into an appropriate 
vector and recovery of the nucleic acids by hybridization with an appropriate 
probe. 

[0206] A method of preparation, by the chemical route, of nucleic acids 
according to the invention having a length greater than 200 nucleotides (or 
200 bp in the case of double-stranded nucleic acids) comprises the following 
steps: 

[0207] - assembly of chemically synthesized oligonucleotides, provided 
at their end with different restriction sites, whose sequences are compatible with 
the stretch of amino acids of the natural polypeptide according to the principle 
described in 1 983, 

[0208] - cloning of the nucleic acids thus obtained into an appropriate 
vector and recovery of the desired nucleic acids by hybridization with an 
appropriate probe. 

[0209] The nucleotide probes used for recovering the desired nucleic 
acids in the abovementioned methods generally consist of 8 to 200 nucleotides 
of the polypeptide sequence according to the invention and are capable of 
hybridizing with the nucleic acid tested for under the hybridization conditions 
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defined above. The synthesis of these probes may be carried out according to 

the automated beta-cyanethylphosphoramidite method described in 1 986. 

[0210] The oligonucleotide probes according to the invention may be 
used in a detection device comprising an oligonucleotide array library. An 
exemplary embodiment of such an array library may consist of an array of probe 
oligonucleotides which are attached to a support, the sequence of each probe of 
a given length being situated with a shift of one or more bases relative to the 
preceding probe, each of the probes of the array arrangement thus being 
complementary to a distinct sequence of the target DMA or RNA to be detected 
and each probe of known sequence being attached at a predetermined position 
of the support. The target sequence to be detected may be advantageously 
labeled radioactively or nonradioactively. When the labeled target sequence is 
brought into contact with the array device, it forms hybrids with the probes having 
complementary sequences. A nuclease treatment, followed by washing, makes 
it possible to remove the probe-target sequence hybrids which are not perfectly 
complementary. Because of the precise knowledge of the sequence of a probe 
at a given position of the array, it is then possible to deduce the nucleotide 
sequence of the target DMA or RNA sequence. This technique is particularly 
effective when matrices of oligonucleotide probes of a large size are used. 

[0211] An alternative to the use of a labeled target sequence may consist 
of using a support allowing a "bioelectronic" detection of the hybridization of the 
target sequence with the probes of the array support, when said support consists 
of or comprises a material capable of acting, for example, as an electron donor at 

48 



PATENT 
Customer No. 22,852 
Attorney Dock t No. 03715-0062-01 

the positions of the array where a hybrid has been formed. Such an electron- 
donating material is for example gold. The detection of the nucleotide sequence 
of the target DNA or RNA is then determined by an electronic device. 

[0212] An exemplary embodiment of a biosensor, as defined above, is 
described in European patent application No. EP-0,721,016 in the name of 
Affymax Technologies N.V. or in American patent No. US 5,202,231 in the name 
of Drmanac. 

[0213] The subject of the invention is also the hybrid polynucleotides 
resulting: 

[0214] - either from the formation of a hybrid molecule between an RNA 
or a DNA (genomic DNA or cDNA) obtained from a biological sample with a 
probe or a primer according to the invention, 

[0215] - or from the fonnation of a hybrid molecule between an RNA or a 
DNA (genomic DNA or cDNA) obtained from a biological sample with a 
nucleotide fragment amplified with the aid of a pair of primers according to the 
invention. 

[0216] cDNA for the purposes of the invention is understood to mean a 
DNA molecule obtained by causing a reverse transcriptase type enzyme to act 
on an RNA molecule, in particular a messenger RNA (mRNA) molecule, 
according to the techniques described in Sambrook et al. in 1989. 

[0217] The subject of the present invention is also a family of 
recombinant plasmids, characterized in that they contain at least one nucleotide 
sequence of a polynucleotide according to the invention. According to an 
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advantageous embodiment of said plasmid it comprises the nucleotide 

sequences SEQ ID NOS: 1, 8, 14, 25, 31, and 33, or a fragment thereof . 

[0218] Another subject of the present invention is a vector for the cloning, 
expression and/or insertion of a sequence, cjharjacterized in that it comprises a 
nucleotide sequence of a polynucleotide according to the invention at a site 
which is not essential for its replication, where appropriate under the control of 
regulatory elements capable of playing a role in the expression of the polypeptide 
DP428, in a given host. 

[0219] Specific vectors are for example plasmids, phages, cosmids, 
phagemids and YACs. 

[0220] These vectors are useful for transforming host cells so as to clone 
or express the nucleotide sequences of the invention. 

[0221] The invention also comprises the host cells transfomied with a 
vector according to the invention. 

[0222] Preferably, the host cells are transformed under conditions 
allowing the expression of a recombinant polypeptide according to the invention. 

[0223] A preferred host cells according to the invention is the E. coli 
strain transformed with the plasmid pDP428 deposited on 28 January 1997 at the 
CNCM under the No, 1-1818 or transformed with the plasmid pM1C25 which was 
deposited on 4 August 1 998 at the CNCM under the No. 1-2062 or a 
mycobacterium belonging to a strain of M. tuberculosis, M. bovis or M, africanum 
potentially possessing all the appropriate regulatory systems. 
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[0224] It is now easy to produce proteins or polypeptides in a relatively 
large quantity by genetic engineering using, as expression vectors, plasmids, 
phages or phagemids. All or part of the DP428 gene, or any polynucleotide 
according to the invention, may be inserted into an appropriate expression vector 
in order to produce in vitro a polypeptide according to the invention, in particular 
the polypeptide DP428. Said polypeptide may be attached to a microplate in 
order to develop a serological test intended to search, for diagnostic purposes, 
for the specific antibodies in patients suffering tuberculosis. 

[0225] Thus, the present invention relates to a method of preparing a 
polypeptide, characterized in that it uses a vector according to the invention. 
More particularly, the invention relates to a method of preparing a polypeptide of 
the invention comprising the following steps: 

[0226] - where appropriate, the prior amplification, according to the PGR 
technique, of the quantity of nucleotide sequences encoding said polypeptide 
with the aid of two DNA primers chosen so that one of these primers is identical 
to the first 10 to 25 nucleotides of the nucleotide sequence encoding said 
polypeptide, while the other primer is complementary to the last 10 to 
25 nucleotides- (or hybridizes with these last 10 to 25 nucleotides) of said 
nucleotide sequence, or conversely so that one of these primers is identical to 
the last 10 to 25 nucleotides of said sequence, while the other primer is 
complementary to the first 10 to 25 nucleotides (or hybridizes with the first^^O to 
25 nucleotides) of said nucleotide sequence, followed by the introductic J said 
sequences thus amplified into an appropriate vector. 
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[0227] - the culture, in an appropriate culture medium, of a cellular host 
which has been previously transformed with an appropriate vector containing a 
nucleic acid according to the invention comprising the nucleotide sequence 
encoding said polypeptide, and 

[0228] - the separation, from said culture medium, of said polypeptide 
produced by said transformed cellular host. 

[0229] The subject of the invention is also a polypeptide which is capable 
of being obtained by a method of the invention as described above. 

[0230] The peptides according to the invention may also be prepared by 
techniques which are conventionally used in the field of peptide synthesis. This 
synthesis may be carried out in homogeneous solution or in solid phase. 

[0231] For example, the technique of synthesis in homogeneous solution 
described by Houbenweyl in 1974 will be used. 

[0232] This method of synthesis consists in successively condensing in 
pairs the successive aminoacyls in the required order, or in condensing 
aminoacyls and fragments formed beforehand and already containing several 
aminoacyls in the appropriate order, or altematively several fragments thus 
prepared beforehand, it being understood that care will be taken to protect 
beforehand all the reactive functions carried by these aminoacyls or fragments, 
with the exception of the amine functions of one and the carboxyl functions of the 
other or vice versa, which should normally be involved in the formation of the 
peptide bonds, in particular after activation of the carboxyl function, according to 
methods well known in peptide synthesis. As a variant, use may be made of 
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coupling reactions using conventional coupling reagents, of the carbodiimide 

type, such as for example 1-ethyl-3-(3-dimethylaminopropyl)carbodiimide. 

[0233] When the aminoacyl used possesses an additional acid function 
(in particular in the case of glutamic acid), these functions will be protected, for 
example with t-butyl ester groups. 

[0234] In the case of gradual synthesis, amino acid by amino acid, the 
synthesis preferably starts with the condensation of the C-terminal amino acid 
with the amino acid which corresponds to the neighboring aminoacyl in the 
desired sequence, and so on, step by step, up to the N-terminal amino acid. 

[0235] According to another preferred technique of the invention, the one 
described by Merrifield is used. 

[0236] To manufacture a peptide chain according to the Merrifield 
method, use is made of a very porous polymer resin onto which the first C- 
temninal amino acid of the chain is attached. This amino acid is attached to the 
resin via its carboxyl group and its amine function is protected, for example with 
the t-butyloxycarbonyl group. 

[0237] When the first C-terminal amino acid is thus attached to the resin, 
the group for protecting the amine function is removed by washing the resin with 
an acid. 

[0238] In the case where the group for protecting the amine function is 
the t-butyloxycarbonyl group, it may be removed by treating the resin with 
trifluoroacetic acid. 
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[0239] The second amino acid which provides the second aminoacyl of 
the desired sequence, from the C-terminal aminoacyl residue, is then coupled 
with the deprotected amine function of the first C-tenninal amino acid attached to 
the chain. Preferably, the carboxyl function of this second amino acid is 
activated, for example with dicyclohexylcarbodiimide, and the amine function is 
protected, for example with t-butyloxycarbonyl. 

[0240] The first portion of the desired peptide chain is thus obtained 
which comprises two amino acids, and whose temiinal amine function is 
protected. As before, the amine function is deprotected and it is then possible to 
proceed to the attachment of the third aminoacyl, under conditions similar to 
those for the addition of the second C-terminal amino acid. 

[0241] The amino acids which will constitute the peptide chain will thus 
be attached, one after the other, to the amino group, each time deprotected 
beforehand, of the portion of the peptide chain which is already fomned and 
which is attached to the resin. 

[0242] When the entire desired peptide chain is fomied, the groups for 
protecting the different amino acids constituting the peptide chain are removed 
and the peptide is detached from the resin, for example with the aid of 
hydrofluoric acid. 

[0243] Preferably, said polypeptides which are capable of being obtained 
by a method of the invention as described above will comprise a region exposed 
to the solvent and will have a length of at least 20 amino acids. 
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[0244] According to another embodiment of the invention, said 
polypeptides are specific to mycobacteria of the Mycobacterium tuberculosis 
complex and are not therefore recognized by antibodies specific for other 
mycobacterial proteins. 

[0245] The invention relates, in addil 
least one polypeptide according to the invention and a sequence of a polypeptide 
capable of inducing an immune response in humans or animals. 

[0246] Advantageously, the antigenic detemiinant is such that it is 
capable of inducing a humoral and/or cellular response. 

[0247] Such a determinant may comprise a polypeptide according to the 
invention, in glycosylated fomi, which is used to obtain immunogenic 
compositions capable of inducing the synthesis of antibodies directed against 
multiple epitopes. Said glycosylated polypeptides also fornn part of the invention. 

[0248] These hybrid molecules may consist in part of a polypeptide- 
carrying molecule according to the invention combined with a portion, in 
particular an epitope of the diphtheria toxin, the tetanus toxin, a hepatitis B virus 
surface antigen (patent FR 79 2181 1), the VP1 antigen of the poliomyelitis virus 
or any other viral or bacterial toxin or antigen. 

[0249] Advantageously, said antigenic detemninant corresponds to an 
antigenic determinant of immunogenic proteins of 45/47 kD of M. tuberculosis 
(international application PCT/FR 96/0166), or alternatively which are selected 
for example from ESAT6 (Harboe et al., 1996, Andersen et al,, 1995, and 
Sorensen et al., 1995) and DES (PCT/FR 97/00923, Gicquel et al.). 
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[0250] A viral antigen, as defined above, will be preferably a hepatitis 

virus surface or envelope protein, for example the hepatitis B surface protein in 

one of its S,S-preS1 , S-preS2 or S-preS2-preS1 fomris or alternatively a protein 

of a hepatitis A virus, or of a hepatitis non-A, non-B virus, such as a hepatitis C, 

E or delta virus. 

[0251] More particularly, a viral antigen as defined above will be the 
whole or part of one of the glycoproteins encoded by the genome of the HIV-1 
virus (patents GB 8324800, EP 84401834 or EP 85905513) or of the HIV-2 virus 
(EP 87400151), and in particular the whole or part of a protein selected from gag, 
pol, nef or env of HIV-1 or HIV-2. 

[0252] The methods for synthesizing the hybrid molecules include the 
methods used in genetic engineering to construct hybrid polynucleotides 
encoding the desired polypeptide sequences. Reference may be 
advantageously made, for example, to the technique for the production of genes 
encoding fusion proteins described by Minton in 1984. 

[0253] Said hybrid polynucleotides encoding a hybrid polypeptide as well 
as the hybrid polypeptides according to the invention characterized in that they 
are recombinant proteins obtained by the expression of said hybrid 
polynucleotides also fornn part of the invention. 

[0254] The polypeptides according to the invention may advantageously 
be used in a method for the in vitro detection of antibodies directed against said 
polypeptides, in particular the polypeptide DP428, and also of antibodies directed 
against a bacterium of the Mycobacterium tuberculosis complex, in a biological 
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sample (biological tissue or fluid) capable of containing them, this method 

comprising bringing this biological sample into contact with a polypeptide 

according to the invention under conditions allowing an immunological reaction 

in vitro between said polypeptide and the an^tibodies which may be present in the 

biological sample, and detecting in vitro the antigen-antibody complexes which 

may be formed. 

[0255] The polypeptides according to the invention may also and 
advantageously be used in a method for the detection of an infection by a 
bacterium of the Mycobacterium tuberculosis complex in a mammal based on the 
in vitro detection of a cellular reaction indicating prior sensitization of the mammal 
to said polypeptide such as for example cell proliferation, the synthesis of 
proteins such as interferon-gamma. This method for the detection of an infection 
by a bacterium of the Mycobacteriun) tuberculosis complex in a mammal is 
characterized in that it comprises the following steps: 

[0256] a) preparation of a biological sample containing cells of said 
mammal, more particularly cells of the immune system of said mammal and still 
more particularly T cells; 

[0257] b) incubation of the biological sample of step a) with a 
polypeptide according to the invention; 

[0258] c) detection of a cellular reaction indicating prior sensitization of 
the mammal to said polypeptide such as for example cell proliferation and/or the 
synthesis of proteins such as interferon-gamma. 
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[0259] Cell proliferation may be measured, for example, by incorporation 
of ^H-Thymidine. 

[0260] Also forming part of the invention are the methods for the 
detection of a delayed hypersensitivity reaction (DTH), characterized in that they 
use a polypeptide according to the invention , 

[0261] Preferably, the biological sample consists of a fluid, for example a 
human or animal serum, blood, biopsies, bronchoalveolar fluid or pleural fluid. 

[0262] Any conventional procedure may be used to carry out such a 
detection. 

[0263] By way of example, a preferred method uses immunoenzymatic 
procedures such as the ELISA, immunofluorescence or radioimmunoassay (RIA) 
technique and the like. 

[0264] Thus, the invention also relates to the polypeptides according to 
the invention, labeled with the aid of a suitable marker of the enzymatic, 
fluorescent or radioactive type. 

[0265] Such methods comprise, for example, the following steps: 

[0266] - deposition of predetermined quantities of a polypeptide 
composition according to the invention into the wells of a microtiter plate, 

[0267] - introduction into said wells of increasing dilutions of serum or of 
another biological sample as defined above, before being analyzed, 

[0268] - incubation of the microplate, 

[0269] - introduction into the wells of the microtiter plate of labeled 
antibodies directed against human or animal immunoglobulins, the labeling of 
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these antibodies having been carried out with the aid of an enzyme selected from 

those which are capable of hydrolyzing a substrate while modifying its radiation 

absorption, at least at a defined wavelength, for example at 550 nm, 

[0270] - detection, by comparing with a control, of the quantity of 
substrate hydrolyzed. 

[0271 ] The invention also relates to a box or kit for the in vitro diagnosis 
of an infection by a mycobacterium belonging to the Mycobacterium tuberculosis 
complex, comprising: 

[0272] - a polypeptide according to the invention, 

[0273] - where appropriate, the reagents for constituting the medium 
which is appropriate for the immunological or specific reaction, 

[0274] - the reagents allowing the detection of the antigen-antibody 
complexes produced by the immunological reaction which may be present in the 
biological sample, and the in vitro detection of the antigen-antibody complexes 
which may be formed, it being possible for these reagents to also carry a marker, 
or to be capable of being recognized in turn by a labeled reagent, more 
particularly in the case where the polypeptide according to the invention is not 
labeled, 

[0275] - where appropriate, a reference biological sample (negative 
control) free of antibodies recognized by a polypeptide according to the invention, 

[0276] - where appropriate, a reference biological sample (positive 
control) containing a predetermined quantity of antibodies recognized by a 
polypeptide according to the invention. 
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[0277] The polypeptides according to the invention make it possible to 
prepare monoclonal or polyclonal antibodies which are characterized in that they 
recognize specifically the polypeptides according to the invention. The 
monoclonal antibodies may be advantageously prepared from hybridomas 
according to the technique described by KoJler and Milstein in 1975. The 
polyclonal antibodies may be prepared, for example, by immunizing an animal, in 
particular a mouse, with a polypeptide according to the invention combined with 
an immune response adjuvant, and then purifying the specific antibodies 
contained in the serum of the immunized animals on an affinity column to which 
the polypeptide which served as antigen has been attached beforehand. The 
polyclonal antibodies according to the invention may also be prepared by 
purifying an affinity column, to which there have been immobilized beforehand a 
polypeptide according to the invention, antibodies contained in the serum of 
patients infected with a mycobacterium and preferably a bacterium belonging to 
the Mycobacterium tuberculosis complex. 

[0278] The subject of the invention is also mono- or polyclonal antibodies 
or fragments thereof, or chimeric antibodies, characterized in that they are 
capable of recognizing specifically a polypeptide according to the invention. 

[0279] The antibodies of the invention may also be labeled in the same 
manner as described above for the nucleic probes of the invention, such as a 
labeling of the enzymatic, fluorescent or radioactive type. 

[0280] The invention relates, in addition, to a method for the specific 
detection of the presence of an antigen of a mycobacterium and preferably a 
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bacterium of the Mycobacterium tuberculosis complex in a biological sample, 

characterized in that it comprises the following steps: 

[0281] a) bringing the biological sample (biological tissue or fluid) 
collected from an individual into contact with a mono- or polyclonal antibody 
according to the invention, under conditions allowing an immunological reaction 
in vitro between said antibodies and the polypeptides specific to mycobacteria 
and preferably bacteria of the Mycobacterium tuberculosis complex which may 
be present in the biological sample, and 

[0282] b) detection of the antigen-antibody complex formed. 

[0283] Also coming within the scope of the invention is a box or kit for the 
/n vitro diagnosis, on a biological sample, of the presence of strains of 
mycobacteria and preferably of bacteria belonging to the Mycobacterium 
tuberculosis complex, preferably M. tuberculosis, characterized in that it 
comprises: 

[0284] - a polyclonal or monoclonal antibody according to the invention, 
labeled where appropriate; 

[0285] - where appropriate, a reagent for constituting the medium which 
is appropriate for carrying out the immunological reaction; 

[0286] - a reagent allowing the detection of the antigen-antibody 
complexes produced by the immunological reaction, it being possible for this 
reagent to also carry a marker, or to be capable of being recognized in turn by a 
labeled reagent, more particularly in the case where said monoclonal or 
polyclonal antibody is not labeled; 
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[0287] - where appropriate, reagents for carrying out the lysis of the cells 
of the sample tested. 

[0288] The subject of the present invention is also a method for the 
detection and rapid identification of the mycobacteria and preferably of the 
M. tuberculosis bacteria in a biological sample, characterized in that it comprises 
the following steps: 

[0289] a) isolation of the DNA from the biological sample to be 
analyzed, or production of a cDNA from the RNA of the biological sample; 

[0290] b) specific amplification of the DNA of mycobacteria and 
preferably of bacteria belonging to the Mycobacterium tuberculosis complex with 
the aid of primers according to the invention; 

[0291] c) analysis of the products of amplification. 

[0292] The products of amplification may be analyzed by various 
methods. 

[0293] Two methods of analysis are given by way of example below: 
[0294] - agarose gel electrophoretic analysis of the products of 

amplification. The presence of a DNA fragment which migrates to the expected 

position suggests that the sample analyzed contained DNA of mycobacteria 

belonging to the tuberculosis complex, or 

[0295] - analysis by the molecular hybridization technique using a nucleic 

probe according to the invention. This probe will be advantageously labeled with 

a nonradioactive (cold probe) or radioactive element, 
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[0296] For the purposes of the present invention, "DNA of the biological 
sample" or "DNA contained in the biological sample" is understood to mean 
either the DNA present in the biological sample considered, or the cDNA 
obtained after the action of a reverse transcriptase-type enzyme on the RNA 
present in said biological sample. 

[0297] Another method of the present invention allows the detection of an 
infection by a mycobacterium and preferably a bacterium of the Mycobacterium 
tuberculosis complex in a mammal. This method comprises the following steps: 

[0298] a) preparation of a biological sample containing cells of said 
mammal, more particularly cells of the immune system of said mammal and still 
more particularly T cells; 

[0299] b) incubation of the biological sample of step a) with a 
polypeptide according to the invention; 

[0300] c) detection of a cellular reaction indicating prior sensitization of 
the mammal to said polypeptide in particular cell proliferation and/or the 
synthesis of proteins such as interferon-gamma; 

[0301] d) detection of a reaction of delayed hypersensitivity or of 
sensitization of the mammal to said polypeptide. 

[0302] This method of detection is an intradermal method which is 
described for example by MJ, Elhay et al, (1988) Infection and Immunity, 66(7): 
3454-3456. 

[0303] Another aim of the present invention consists in a method for the 
detection of the mycobacteria and preferably the bacteria belonging to the 
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Mycobacterium tuberculosis complex in a biological sample, characterized in that 

it comprises the following steps: 

[0304] a) bringing an oligonucleotide probe according to the invention 
into contact with a biological sample, the DMA cpntained in the biological sample, 
or the cDNA obtained by reverse transcription of the RNA of the biological 
sample, having, where appropriate, been made accessible to the hybridization 
beforehand, under conditions allowing the hybridization of the probe with the 
DNA or the cDNA of the mycobacteria and preferably of the bacteria of the 
Mycobacterium tuberculosis complex; 

[0305] b) detection of the hybrid formed between the oligonucleotide 
probe and the DNA of the biological sample. 

[0306] The invention also relates to a method for the detection of the 
mycobacteria and preferably of the bacteria belonging to the Mycobacterium 
tuberculosis complex in a biological sample, characterized in that it comprises 
the following steps: 

[0307] a) bringing an oligonucleotide probe according to the invention, 
immobilized on a support, into contact with a biological sample, the DNA of the 
biological sample having, where appropriate, been made accessible to the 
hybridization beforehand, under conditions allowing the hybridization of said 
probe with the DNA of the mycobacteria and preferably of the bacteria of the 
Mycobacterium tuberculosis complex; 

[0308] b) bringing the hybrid formed between said oligonucleotide 
probe immobilized on a support and the DNA contained in the biological sample, 
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where appropriate after removal of the DNA of the biological sample which has 

not hybridized with the probe, into contact with a labeled oligonucleotide probe 

according to the invention. 

[0309] According to an advantageous embodiment of the method of 
detection defined above, it is characterized in that, prior to step a), the DNA of 
the biological sample is amplified beforehand with the aid of a pair of primers 
according to the invention. 

[0310] Another embodiment of the method of detection according to the 
invention consists in a method for the detection of the presence of the 
mycobacteria and preferably the bacteria belonging to the Mycobacterium 
tuberculosis complex in a biological sample, characterized in that it comprises 
the following steps: 

[0311] a) bringing the biological sample into contact with a pair of 
primers according to the invention, the DNA contained in the sample having 
been, where appropriate, made accessible to hybridization beforehand, under 
conditions allowing hybridization of said primers with the DNA of the 
mycobacteria and preferably of the bacteria of the Mycobacterium tuberculosis 
complex; 

[031 2] b) amplification of the DNA of a mycobacterium and preferably 
of a bacterium of the Mycobacterium tuberculosis complex; 

[031 3] c) detection of the amplification of the DNA fragments 
corresponding to the fragment flanked by the primers, for example by gel 
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electrophoresis or by means of an oligonucleotide probe according to the 

invention. 

[0314] A subject of the invention is also a method for the detection of the 
presence of the mycobacteria and preferably the bacteria belonging to the 
Mycobacterium tuberculosis complex in a biological sample by strand 
displacement, characterized in that it comprises the following steps: 

[031 5] a) bringing the biological sample into contact with two pairs of 
primers according to the invention specifically intended for amplification of the 
SDA type described above, the DNA content in the sample having been, where 
appropriate, made accessible to hybridization beforehand, under conditions 
allowing hybridization of the primers with the DNA of the mycobacteria and 
preferably the bacteria of the Mycobacterium tuberculosis complex; 

[0316] b) amplification of the DNA of the mycobacteria and preferably 
of the bacteria of the Mycobacterium tuberculosis complex; 

[031 7] c) detection of the amplification of DNA fragments 
corresponding to the fragment flanked by the primers, for example by gel 
electrophoresis or by means of an oligonucleotide probe according to the 
invention. 

[031 8] The invention also relates to a box or kit for carrying out the 
method described above, intended for the detection of the presence of the 
mycobacteria and preferably the bacteria of the Mycobacterium tuberculosis 
complex in a biological sample, characterized in that it comprises the following 
components: 
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[0319] a) an oligonucleotide probe according to the invention; 

[0320] b) the reagents necessary for carrying out a hybridization 

reaction; 

[0321] c) where appropriate, a pair of primers according to the 
invention as well as the reagents necessary for a reaction of amplification of the 
DNA (genomic DNA, plasmid DNA or cDNA) of mycobacteria and preferably of 
bacteria of the Mycobacterium tuberculosis complex, 

[0322] The subject of the invention is also a kit or box for the detection of 
the presence of the mycobacteria and preferably the bacteria of the 
Mycobacterium tuberculosis complex in a biological sample, characterized in that 
it comprises the following components: 

[0323] a) an oligonucleotide probe, termed capture probe, according 
to the invention; 

[0324] b) an oligonucleotide probe, termed revealing probe, according 
to the invention; 

[0325] c) where appropriate, a pair of primers according to the 
invention as well as the reagents necessary for a reaction of amplification of the 
DNA of mycobacteria and preferably of bacteria of the Mycobacterium 
tuberculosis complex. 

[0326] The invention also relates to a kit or box for the amplification of 
the DNA of the mycobacteria and preferably the bacteria of the Mycobacterium 
tuberculosis complex present in a biological sample, characterized in that it 
comprises the following components: 
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[0327] a) a pair of primers according to the invention; 

[0328] b) the reagents necessary for carrying out a DNA amplification 

reaction; 

[0329] c) optionally, a component which makes it possible to verify the 
sequence of the amplified fragment, more particularly an oligonucleotide probe 
according to the invention, 

[0330] Another subject of the present invention relates to an 
immunogenic composition, characterized in that it comprises a. polypeptide 
according to the invention. 

[0331] Another immunogenic composition according to the invention is 
characterized in that it comprises one or more polypeptides according to the 
invention and/or one or more hybrid polypeptides according to the invention. 

[0332] According to an advantageous embodiment, the above-defined 
immunogenic composition constitutes a vaccine when it is provided in 
combination with a pharmaceutically acceptable vehicle and optionally one or 
more immunity adjuvants such as alum or a representative of the family of 
muramyl peptides or altematively incomplete Freund's adjuvant. 

[0333] Various types of vaccine are currently available for protecting 
humans against infectious diseases: attenuated live microorganisms {M. bovis - 
BCG for tuberculosis), inactivated microorganisms (influenza virus), acellular 
extracts (Bordetella pertussis ior whooping cough), recombinant proteins 
(hepatitis B virus surface antigen), polysaccharides (pneumococci). Experiments 
are being carried out on vaccines prepared from synthetic peptides or genetically 
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modified microorganisms expressing heterologous antigens. More recently still, 
recombinant plasmid DNAs carrying genes encoding protective antigens have 
been proposed as an alternative vaccine strategy. This type of vaccination is 
carried out with a specific plasmid which is derived from an £ coli plasmid which 
does not replicate in vivo and which encodes only the vaccinal protein. The 
principal functional components of this plasmid are: a strong promoter allowing 
expression in eukaryotic cells (for example that of CMV), an appropriate cloning 
site for inserting the gene of interest, a temnination-polyadenylation sequence, a 
prokaryotic replication origin for producing the recombinant plasmid in vitro and a 
selectable marker (for example the ampicillin-resistance gene) for facilitating the 
selection of the bacteria which contain the plasmid. Animals were immunized by 
simply injecting the naked plasmid DNA into the muscle. This technique leads to 
the expression of the vaccinal protein in situ and to an immune response in 
particular of the cellular type (CTL) and of the humoral type (antibody). This 
double induction of the immune response is one of the main advantages of the 
vaccination technique with naked DNA. Huygen et al. (1996) and Tascon et al. 
(1996) succeeded in obtaining a degree of protection against /W, tuberculosis by 
injecting recombinant plasmids containing M. leprae genes {hsp65, 36kDa pra) 
as inserts. M, leprae is the agent responsible for leprosy. The use of an insert 
specific to M. tuberculosis such as, for example, the whole or part of the DP428 
gene, which is the subject of the present invention, would probably lead to a 
better protection against tuberculosis. The whole or part of the DP428 gene, or 
any polynucleotide according to the invention, can be easily inserted into the 
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plasmid vectors V1J (Montgomery et al., 1993), pcDNA3 (Invitrogen, 

R&D Systems) or pcDNA1/Neo (Invitrogen) which possess the necessary 

characteristics for a vaccinal use. 

[0334] The invention thus relates to a vaccine, characterized in that it 
comprises one or more polypeptides according to the invention and/or one or 
more hybrid polypeptides according to the invention as previously defined, in 
combination with a pharmaceutically compatible vehicle and, where appropriate, 
one or more appropriate immunity adjuvants. 

[0335] The invention also relates to a vaccine composition intended for 
the immunization of humans or animals against a bacterial or viral infection, such 
as tuberculosis or hepatitis, characterized in that it comprises one or more hybrid 
polypeptides as previously defined in combination with a phamiaceutically 
compatible vehicle and, where appropriate, one or more immunity adjuvants. 

[0336] Advantageously, in the case of a protein which is a hybrid 
between a polypeptide according to the invention and the hepatitis B surface 
antigen, the vaccine composition will be administered, in humans, in an amount 
of 0.1 to 1 \ig of purified hybrid protein per kilogram of the weight of the patient, 
preferably 0.2 to 0.5 }ig/kg of the weight of the patient, for a dose intended for a 
given administration. In the case of patients suffering from disorders of the 
immune system, in particular immunosuppressed patients, each injected dose 
will preferably contain half of the quantity, by weight, of the hybrid protein 
contained in a dose intended for a patient not suffering from immune system 
disorders. 
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[0337] Preferably, the vaccine composition will be administered several 
times, spread out over time, by the intradermal or subcutaneous route. By way 
of example, three doses as defined above will be administered, respectively, to 
the patient at time tO, at time tO + 1 month and at time tO + 1 year, 

[0338] Alternatively, three doses will be administered, respectively, to the 
patient at time tO, at time tO + 1 month and at time tO + 6 months. 

[0339] In mice, in which a weight dose of the vaccine composition 
comparable to the dose used in humans is administered, the antibody reaction is 
tested by collecting serum followed by a study of the formation of a complex 
between the antibodies present in the serum and the antigen of the vaccine 
composition, according to the customary techniques. 

[0340] The invention also relates to an immunogenic composition 
characterized in that it comprises a polynucleotide or an expression vector 
according to the invention, in combination with a vehicle allowing its 
administration to humans or animals. 

[0341] The subject of the invention is also a vaccine intended for 
immunizing against a bacterial or viral infection, such as tuberculosis or hepatitis, 
characterized in that it comprises a polynucleotide or an expression vector 
according to the invention, in combination with a phamriaceutically acceptable 
vehicle. 

[0342] Such immunogenic or vaccine compositions are in particular 
described in international application No. WO 90/11 092 (Vical Inc.) and also in 
international application No. WO 95/1 1307 (Institut Pasteur). 
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[0343] The constituent polynucleotide of the immunogenic composition or 

of the vaccine composition according to the invention may be injected into the 

host after having been coupled with compounds which promote the penetration 

of this polynucleotide into the cell or its transport to the cell nucleus. The 

resulting conjugates may be encapsulated into polymer microparticles, as 

described in international application No. WO 94/27238 (Medisorb Technologies 

International). 

[0344] According to another embodiment of the immunogenic and/or 
vaccine composition according to the invention, the polynucleotide, preferably a 
DNA, is complexed with DEAE-dextran (Pagano et al., 1967) or with nuclear 
proteins (Kaneda et al., 1989), with lipids (Feigner et al., 1987) or encapsulated 
into liposomes (Fraley et al., 1980). 

[0345] According to yet another advantageous embodiment of the 
immunogenic and/or vaccine composition according to the invention, the 
polynucleotide according to the invention may be introduced in the form of a gel 
facilitating its transfection into cells. Such a composition in gel form may be a 
poly-L-lysine and lactose complex, as described by Midoux in 1993, or 
Poloxamer 407™, as described by Pastore in 1994. The polynucleotide or the 
vector according to the invention may also be in suspension in a buffer solution 
or may be combined with liposomes. 

[0346] Advantageously, such a vaccine will be prepared in accordance 
with the technique described by Tacson et al. or Huygen et al. in 1996 or in 
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accordance with the technique described by Davis et al, in international 

application No. WO 95/1 1307 (Whalen et al.). 

[0347] Such a vaccine will be advantageously prepared in the form of a 
composition containing a vector according to the invention, placed under the 
control of regulatory elements allowing its expression in humans or animals. 

[0348] To produce such a vaccine, the polynucleotide according to the 
invention is first of all subcloned into an appropriate expression vector, 
particularly an expression vector containing regulatory and expression signals 
recognized by the enzymes in eukaryotic cells and also containing a replication 
origin which is active in prokaryotes, for example in E. coli, which allows its prior 
amplification. The purified recombinant plasmid obtained is then injected into the 
host, for example by the intramuscular route. 

[0349] It will be possible, for example, to use as vector for expressing 
in wVothe antigen of interest the plasmid pcDNA3 or the plasmid pcDNAI/neo, 
both marketed by Invitrogen (R&D Systems, Abingdon, United Kingdom). It is 
also possible to use the plasmid VI Jns.tPA described by Shiver et al. in 1995. 

[0350] Such a vaccine will advantageously comprise, in addition to the 
recombinant vector, a saline solution, for example a sodium chloride solution. 

[0351] A vaccine composition as defined above will be, for example, 
administered by the parenteral route or by the intramuscular route. 

[0352] The present invention also relates to a vaccine characterized in 
that it contains one or more nucleotide sequences according to the invention 
and/or one or more polynucleotides as mentioned above in combination with a 

73 



PATENT 
Customer No. 22,852 
Attorn y Docket No. 03715-0062-01 

pharmaceutically compatible vehicle and, where appropriate, one or more 

appropriate immunity adjuvants. 

[0353] Another aspect relates to a method of screening molecules 



eria or the maintenance of 
said molecules block the synthesis 



capable of inhibiting the growth of mycobact 
mycobacteria in a host, characterized in tha : 
or the function of the polypeptides encoded by a nucleotide sequence according 
to the invention or by a polynucleotide as described supra. 

[0354] In said method of screening, the molecules may be anti- 
messengers or may induce the synthesis of anti-messengers. 

[0355] The present invention also relates to molecules capable of 
inhibiting the growth of mycobacteria or the maintenance of mycobacteria in a 
host, characterized in that said molecules are synthesized based on the structure 
of the polypeptides encoded by a nucleotide sequence according to the invention 
or by a polynucleotide as described supra. 

[0356] Other characteristics and advantages of the invention appear in 
the following examples and figures: 
FIGURES 

The Figure 1 series: 

[0357] The Figure 1 series illustrates the series of nucleotide sequences 
SEQ ID NOS: 1 , 8. 14, 25, 31 , and 33 corresponding to the insert of the vector 
pDP428 (deposited at the CNCM under the No. 1-1818) and the series of amino 
acid sequences SEQ ID NOS: 2-7, 9-13, 15-24, 26-30, 32, 34 of the polypeptides 
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encoded by the series of nucleotide sequences SEQ ID NOS: 1, 8, 14, 25. 31, 

and 33. 

Figure 2: 

[0358] Illustrates the nucleotide sequence SEQ ID NO: 35 corresponding 
to the region including the gene encoding the polypeptide DP428 (region 
underlined). Both the ATG and GIG codons for initiation of translation were 
taken into account in this figure. The figure shows that the polypeptide DP428 is 
probably part of an operon comprising at least three genes. The double-boxed 
region probably includes the promoter regions. 

[0359] The single-boxed region corresponds to the motif LPISG (SEQ ID 
NO: 934) which resembles the motif LPXTG (SEQ ID NO: 935) described in 
Gram-positive bacteria as allowing anchorage to peptidoglycans. 
The Figure 3 series: 

[0360] The Figure 3 series represents the series of nucleotide sequences 
SEQ ID NOS: 41, 46, 52 corresponding to the insert of the vector p6D7 
(deposited at the CNCM under the No. 1-1814) and the series of amino acid 
sequences SEQ ID NOS: 42-45, 47-51 , and 53-55. 
The Figure 4 series: 

[0361] The Figure 4 series represents the series of nucleotide sequences 
SEQ ID NOS: 56, 62, 64, 67, 69, 72, 74, 76, 78, 81 , 84, and 86 corresponding to 
the insert of the vector p5A3 (deposited at the CNCM under the No. 1-1815) and 
the series of amino acid sequences SEQ ID NOS: 57-61, 63, 65-66, 68, 70-71, 
73, 75, 77, 79-80, 82-83, 85, and 87. 
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The Figure 5 series: 

[0362] The Figure 5 series represents the series of nucleotide sequences 
SEQ ID NOS: 88, 90, 92, 96, 98, 100, 104, 106, and 108 corresponding to the 
insert of the vector p5F6 (deposited at the CNCM under the No. 1-1816) and the 
series of amino acid sequences SEQ ID NOS: 93-95, 97, 99, 101-103, 105, 107, 
and 109. 

The Figure 6 series: 

[0363] The Figure 6 series represents the series of nucleotide sequences 
SEQ ID NOS: 1 1 0, 1 1 3, and 1 1 9 corresponding to the insert of the vector p2A29 
(deposited at the CNCM under the No. 1-181 7) and the series of amino acid 
sequences SEQ ID NOS: 111-112, 114-118, and 120-121. 
The Figure 7 series: 

[0364] The Figure 7 series represents the series of nucleotide sequences 
SEQ ID NOS: 122, 128, and 133 corresponding to the Insert Of the vector p5B5 
(deposited at the CNCM under the No. 1-1819) and the series of amino acid 
sequences SEQ ID NOS: 123-127, 129-132, and 134-136. 
The Figure 8 series: 

[0365] The Figure 8 series represents the series of nucleotide sequences 
SEQ ID NOS: 137, 139, 141, 143, 145, 148, 150, 152, 154, and 156 
corresponding to the insert of the vector pi C7 (deposited at the CNCM under the 
No. 1-1820) and the series of amino acid sequences SEQ ID NOS: 138, 272-273, 
140, 142, 144, 146-147, 149, 151, 153, 155, and 157. 
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The Figure 9 series: 

[0366] The Figure 9 series represents the series of nucleotide sequences 
SEQ ID NOS: 158, 160, and 162 corresponding to the insert of the vector p2D7 
(deposited at the CNCM under the No. 1-1821) and the series of amino acid 
sequences SEQ ID NOS: 159, 161, 163, and 164. 
The Figure 10 series: 

[0367] The Figure 10 series represents the series of nucleotide 
sequences SEQ ID NOS: 165, 169, and 177 corresponding to the insert of the 
vector pi B7 (deposited at the CNCM under the No. 1-1843) and the series of 
amino acid sequences SEQ ID NOS: 166-168, 170-176, 178-183. 
The Figure 1 1 series: 

[0368] The Figure 1 1 series represents the series of nucleotide 
sequences SEQ ID NOS: 184, 189, 195, 200, 202, 206, 209. and 21 1 and the 
series of amino acid sequences SEQ ID NOS: 185-188, 190-194, 196-199, 201, 
203-205. 207-208, 210, and 212. 
The Figure 12 series: 

[0369] The Figure 12 series represents the series of nucleotide 
sequences SEQ ID NOS: 213, 217, and 220 and the series of amino acid 
sequences SEQ ID NOS: 214-216, 218-219, and 221-224. 
The Figure 13 series: 

[0370] The Figure 13 series represents the series of nucleotide 
sequences SEQ ID NOS: 225. 228, 238, 246, 250, 255, 258. and 260 and the 
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series of amino acid sequences SEQ ID NOS: 226-227, 923-925, 229-237, 239- 

245, 247-249, 251-254, 256-257, 259, and 261. 

The Figure 14 series: 

[0371] The Figure 14 series represents the series of nucleotide 
sequences SEQ ID NOS: 262, 268, 274, 278, 280, 282, 284. 286, 288, 297. 290, 
and 310 corresponding to the insert of the vector p5B5 (deposited at the CNCM 
under the No. 1-1819) and the series of amino acid sequences SEQ ID NOS: 
263-267, 269-271 , 275-277, 279, 281 , 283, 285, 287, 289, 291-296, 298-309. 
and 311-316. 
The Figure 15 series: 

[0372] The Figure 15 series represents the series of nucleotide 
sequences SEQ ID NOS: 317, 321, 323, 325, 327, 331, 333, 335, 337, 339, 346, 
and 347 and the series of amino acid sequences SEQ ID NOS: 318-320, 322, 
324, 326. 328-330, 332. 334, 336. 338, 340-345, and 348-352. 
The Figure 16 series: 

[0373] The Figure 16 series represents the series of nucleotide 
sequences SEQ ID NOS: 353, 357, and 359 and the series of amino acid 
sequences SEQ ID NOS: 354-356. 358. 360, and 926-930. 
The Figure 17 series: 

[0374] The Figure 17 series represents the series of nucleotide 
sequences SEQ ID NOS: 361, 364. 368. 371, 374, 380. 383. and 385 and the 
series of amino acid sequences SEQ ID NOS: 362-363, 365-367, 369-370, 372. 
373, 375-379. 381-382. 384, and 386. 
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The Figure 18 series: 

[0375] The Figure 18 series represents the series of nucleotide 
sequences SEQ ID NOS: 387, 389, 393, 395, 397, 399, 403, and 405 and the 
series of amino acid sequences SEQ ID NOS: 388, 390-392, 394, 396, 398, 400- 
402, 404, and 406. 
The Figure 19 series: 

[0376] The Figure 19 series represents the series of nucleotide 
sequences SEQ ID NOS: 407, 410, 412, 419, 421, 426, 429, and 431 and the 
series of amino acid sequences SEQ ID NOS: 408-409, 41 1, 413-418, 420, 422- 
425, 427-428, 430, and 432. 
The Figure 20 series: 

[0377] The Figure 20 series represents the series of nucleotide 
sequences SEQ ID NOS: 433, 437, 441 , 447, 452, 456, 459, and 461 
corresponding to the insert of the vector p2A29 (deposited at the CNCM under 
the No. 1-1817) and the series of amino acid sequences SEQ ID NOS: 434-436, 
438-440, 442-446, 448-451, 453-455, 457-458, 460, and 462. 
The Figure 21 series: 

[0378] The Figure 21 series represents the series of nucleotide 
sequences SEQ ID NOS: 463, 469, 472, 474, 476, 482, 485, and 487 and the 
series of amino acid sequences SEQ ID NOS: 464-468, 470, 471, 473, 475, 477- 
481 , 483-484, 486, and 488. 
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The Figure 22 series: 

[0379] The Figure 22 series represents the series of nucleotide 
sequences SEQ ID NOS: 489, 495, and 497 and the series of amino acid 
sequences SEQ ID NOS: 490-494, 496, and 498-500. 
The Figure 23 series: 

[0380] The Figure 23 series represents the series of nucleotide 
sequences SEQ ID NOS: 501 , 505, and 510 and the series of amino acid 
sequences SEQ ID NOS: 502-504, 506-509, and 51 1-515. 
The Figure 24 series: 

[0381] The Figure 24 series represents the series of nucleotide 
sequences SEQ ID NOS: 516, 519, and 522 and the series of amino acid 
sequences SEQ ID NOS: 517-518. 520-521, and 523-527. 
Figures 25 and 26: 

[0382] Figures 25 and 26 illustrate, respectively, the sequences 
SEQ ID NO: 528 and SEQ ID NO: 529 representing a pair of primers used to 
specifically amplify, by PGR, the region corresponding to nucleotides 964 to 1234 
included in the sequence SEQ ID NOS: 1,8, 14, 25, 31, and 33. 
The Figure 27 series: 

[0383] The Figure 27 series represents the series of nucleotide 
sequences SEQ ID NOS: 530, 534, and 537 corresponding to the insert of the 
vector p5A3 and the series of amino acid sequences SEQ ID NOS: 531-533, 
535-536, and 538-542. 
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Figure 28: 

[0384] The amino acid sequence as defined in Figure 28 represents the 
amino acid sequence SEQ ID NO: 543 corresponding to the polypeptide DP428. 
Figure 29: 

[0385] Figure 29 represents the nucleotide sequence SEQ ID NO: 544 of 
the complete gene encoding the Ml 025 protein. 
Figure 30: 

[0386] Figure 30 represents the amino acid sequence SEQ ID NO: 545 
of the Ml 025 protein. 
The Figure 31 series: 

[0387] The Figure 31 series represents the series of nucleotide 
sequences SEQ ID NOS: 546, 550, 552, and 554 and the series of amino acid 
sequences SEQ ID NOS: 547-549, 551, 553, and 555. 
The Figure 32 series: 

[0388] The Figure 32 series represents the series of nucleotide 
sequences SEQ ID NOS: 556, 558, 564, 569, and 571 and the series of amino 
acid sequences SEQ ID NOS: 557, 559-563, 565- 568, 570, and 572. 
The Figure 33 series: 

[0389] The Figure 33 series represents the series of nucleotide 
sequences SEQ ID NOS: 573, 576, 580, 584, and 586 and the series of amino 
acid sequences SEQ ID NOS: 574-575, 577-579, 581 -583, 585, and 587. 
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The Figure 34 series: 

[0390] The Figure 34 series represents the series of nucleotide 
sequences SEQ ID NOS: 588, 590, 594, and 596 and the series of amino acid 
sequences SEQ ID NOS: 587, 589, 591-593, 595, and 597. 
The Figure 35 series: 

[0391] The Figure 35 series represents the series of nucleotide 
sequences SEQ ID NOS: 598, 600, 604, 608, and 610 and the series of amino 
acid sequences SEQ ID NOS: 599, 601 -603, 605-607, 609, and 611. 
The Figure 36 series: 

[0392] The Figure 36 series represents the series of nucleotide 
sequences SEQ ID NOS: 612, 614, 616, 618, nd 620 and the series of amino 
acid sequences SEQ ID NOS: 613, 615, 617, 619, and 621. 
The Figure 37 series: 

[0393] The Figure 37 series represents the series of nucleotide 
sequences SEQ ID NOS: 622, 624, 626, 629, and 631 and the series of amino 
acid sequences 623, 625, 627-628, 630, and 632. 
The Figure 38 series: 

[0394] The Figure 38 series represents the series of nucleotide 
sequences SEQ ID NOS: 633, 635, 640, 647, and 649, and the series of amino 
acid sequences SEQ ID NOS: 634, 636-639, 641-646, 648, and 650. 
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The Figure 39 series: 

[0395] The Figure 39 series represents the series of nucleotide 
sequences SEQ ID NOS: 651 , 653, 657, 660, and 662 and the series of amino 
acid sequences SEQ ID NOS: 652, 654-656, 658-659, 661, and 663. 
The Figure 40 series: 

[0396] The Figure 40 series represents the series of nucleotide 
sequences SEQ ID NOS: 664, 666, 669, 674, and 676, and the series of amino 
acid sequences SEQ ID NOS: 665, 931-933, 667-668, 670-673, 675, and 677. 
The Figure 41 series: 

[0397] The Figure 41 series represents the series of nucleotide 
sequences SEQ ID NOS: 678, 683, 686, 691. 693, 695, 697, 702, and 717 
corresponding to the insert of the vector p2D7 (deposited at the CNCM under the 
No. 1-1821) and the series of amino acid sequences SEQ ID NOS: 679-682, 684, 
685, 687-690, 692, 694, 696, 698-701 , 703-71 6, and 71 8-727. 
The Figure 42 series: 

[0398] The Figure 42 series represents the series of nucleotide 
sequences SEQ ID NOS: 728, 733, 736, 739, and 741 and the series of amino 
acid sequences SEQ ID NOS: 729-732, 734-735, 737-738, 740, and 742. 
The Figure 43 series: 

[0399] The Figure 43 series represents the series of nucleotide 
sequences SEQ ID NOS: 743, 746, 752, 755, and 757 and the series of amino 
acid sequences SEQ ID NOS: 744-745, 747-751 , 753-754, 756, and 758. 
The Figure 44 series: 
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[0400] The Figure 44 series represents the series of nucleotide 
sequences SEQ ID NOS: 759, 761 , 764, 767, and 769, and the series of amino 
acid sequences SEQ ID NOS: 760, 762, 763, 765-766, 768, and 770. 
The Figure 45 series: 

[0401] The Figure 45 series represents the series of nucleotide 
sequences SEQ ID NOS: 771, 784, 794, 805, 807, and 809 and the series of 
amino acid sequences SEQ ID NOS: 772-783, 785-793, 795-804, 806, 808, and 
810. 

The Figure 46 series: 

[0402] The Figure 46 series represents the series of nucleotide 
sequences SEQ ID NOS: 81 1 , 81 3, 81 7, 821 , and 823 and the series of amino 
acid sequences SEQ ID NOS: 812, 814-816, 818-820, 822, and 824. 
The Figure 47 series: 

[0403] The Figure 47 series represents the series of nucleotide 
sequences SEQ ID NOS: 825, 827, 831 , 833, and 835 and the series of amino 
acid sequences SEQ ID NOS: 826, 828-830, 832, 834, and 836. 
The Figure 48 series: 

[0404] The Figure 48 series represents the series of nucleotide 
sequences SEQ ID NOS: 837, 839, 842, 844, and 846 and the series of amino 
acid sequences SEQ ID NOS: 838, 840-841, 843, 845, and 847. 
The Figure 49 series: 
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[0405] The Figure 49 series represents the series of nucleotide 
sequences SEQ ID NOS: 848, 864, 878, 883, and 885 and the series of amino 
acid sequences SEQ ID NOS: 849-863, 865-877, 879, 880-882, 884, and 886. 
The Figure 50 series: 

[0406] The Figure 50 series represents the series of nucleotide 
sequences SEQ ID NOS: 887, 895, 901 , 907. and 909 and the series of amino 
acid sequences SEQ ID NOS: 888-894, 896-900, 902-906, 908, and 910. 
Figure 51: 

[0407] A. Construct pJVED: shuttle plasmid (capable of multiplying in 
mycobacteria as well as in E coli) with a kanamycin-resistance gene (derived 
from Tn903) as a selectable marker. The truncated phoA gene (A phoA) and the 
luc gene form a synthetic operon. 

[0408] B. Joining sequence (SEQ ID NO: 922) between phoA and luc. 
Figure 52: 

[0409] Genomic hybridization (Southern blotting) of the genomic DNA of 
various mycobacterial species with the aid of an oligonucleotide probe whose 
sequence is the sequence between the nucleotide at position nt 964 (5' end of 
the probe) and the nucleotide at position nt 1234 (3' end of the probe), ends 
included, of the sequence SEQ ID NOS: 1,8, 14, 25, 31, and 33. 
Figures 53 and 54: 

[0410] Recombinant M. smegmatis Luc and PhoA activities containing 
pJVED with various nucleotide fragments as described in the examples. 
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Figures 52 and 53 represent the results obtained for two separate experiments 

carried out under the same conditions. 

Figure 55: 

[0411] Representation of the hydrophobicity (Kyte and Doolitle) of the 
coding sequence of the polypeptide DP428 With its schematic representation. 
The LPISG (SEQ ID NO: 934) motif immediately precedes the hydrophobic C- 
terminal region. The sequence ends with two arginines. 
Figure 56: 

[0412] Representation of the hydrophobicity (Kyte and Doolitle) of the 
sequence of the polypeptide M1 C25 having the amino acid sequence 
SEQ ID NO: 545. 
Figure 57: 

[0413] A- Acrylamide gel (12%) under denaturing conditions of a 
bacterial extract obtained by sonication of E. co// Ml 5 bacteria containing the 
plasmid pM1C25 without and after 4 hours of induction with IPTG, stained with 
Coomassie Blue. 

[0414] Lane 1 : Molecular weight marker (Prestained SDS-PAGE 
Standards High Range BIO-RAD®). 

[041 5] Lane 2: Bacterial extract obtained by sonication of E. coli Ml 5 
bacteria containing the plasmid pM1C25 without induction with IPTG. 

[041 6] Lane 3: Bacterial extract obtained by sonication of E. coli Ml 5 
bacteria containing the plasmid pM1C25 after 4 hours of induction with IPTG. 
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[041 7] Lane 4: Molecular weight marker (Prestained SDS-PAGE 
Standards Low Range BIO-RAD®). 

[0418] B - Western blotting of a similar gel (12% acrylamide) visualized 
by means of the penta-His antibody marketed by the company Quiagen. 

[041 9] Lane 1 : Representation of the molecular weight marker 
(Prestained SDS-PAGE Standards High Range BIO-RAD®). 

[0420] Lane 2: Bacterial extract obtained by sonication of E. co// Ml 5 
bacteria containing the plasmid pM1C25 without induction with IPTG. 

[0421 ] Lane 3: Bacterial extract obtained by sonication of E. coli Ml 5 
bacteria containing the plasmid pM1C25 after 4 hours of induction with IPTG, 

[0422] Lane 4: Representation of the molecular weight marker 
(Prestained SDS-PAGE Standards Low Range BIO-RAD®). 

[0423] The band which is most predominantly present in the lanes 
corresponding to the bacteria induced with IPTG compared with those not 
induced with IPTG, between 34,200 and 28,400 daltons, corresponds to the 
expression of the insert Ml 025 cloned into the vector pQE-60 (Qiagen®). 

[0424] As regards the legend to the other figures which are numbered by 
an alphanumeric character, each of these other figures represents the nucleotide 
sequence and the amino acid sequence having the SEQ ID sequence whose 
numbering is identical to the alphanumeric character of each of said figures. 

[0425] The alphanumeric numberings of the figures representing the 
SEQ IDs comprising a number followed by a letter have the following meanings: 
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[0426] - the alphanumeric numberings having the same number relate to 
the same family of sequences attached to the reference SEQ ID sequence 
whose numbering has this same number and the letter A; 

[0427] - the letters A, B and C for the same family of sequences 
distinguish the three possible reading frames of the reference SEQ ID nucleotide 
sequence (A); 

[0428] - the letters with a prime (') index mean that the sequence 
corresponds to a fragment of the reference SEQ ID sequence (A); 

[0429] - the letter D means that the sequence corresponds to the 
sequence of the gene predicted by Cole et al., 1998; 

[0430] - the letter F means that the sequence corresponds to the open 
reading frame (ORF) containing the corresponding "D" sequence according to 
Coleetal., 1998; 

[0431] - the letter G means that the sequence is a sequence predicted by 
Cole et al., 1998, and exhibiting a homology of more than 70% with the reference 
SEQ ID sequence (A); 

[0432] - the letter H means that the sequence corresponds to the open 
reading frame containing the corresponding "G" sequence according to 
Cole et al., 1998; 

[0433] - the letter R means that the sequence corresponds to a sequence 
predicted by Cole et al., 1998, upstream of the corresponding "D" sequence and 
capable of being in phase with the sequence "D" because of possible sequencing 
errors; 
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[0434] - the letter P means that the sequence corresponds to the open 
reading phase containing the corresponding "R" sequence; 

[0435] - the letter Q means that the sequence corresponds to a 
sequence containing the corresponding "F" and "P" sequences. 

[0436] As regards the sequence family SEQ ID NOS: 56-87, the 
preceding insert phoA contains two fragments which are noncontiguous on the 
genome, SEQ ID NO: 76 and SEQ ID NO: 56, and which are therefore derived 
from a multiple cloning allowing the expression and export of phoA. These two 
noncontiguous fragments, the genes and the open reading frames containing 
them according to Cole et al., 1998, are important for the export of an antigenic 
polypeptide: 

[0437] - the letters J, K and L distinguish the three possible reading 
frames of the corresponding nucleotide sequence "J"; 

[0438] - the letter M means that the sequence corresponds to the 
sequence predicted by Cole et al., 1998, and containing the sequence 
SEQ ID NO: 77; 

[0439] - the letter N means that the sequence corresponds to the open 
reading frame containing the sequence SEQ ID NO: 84. 

[0440] As regards the sequence family SEQ ID NOS: 771 -810, the letter 
Z means that the sequence corresponds to the sequence of a cloned fragment 
fused with phoA. 

[0441] Finally, as regards the sequence family SEQ ID NOS: 678-727, 
the letter S means that the sequence corresponds to a sequence predicted by 
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Cole et al., 1998 and which may be in the same reading frame as the 

corresponding sequence "D", the letter T meaning that the corresponding 

sequence contains the corresponding sequences "P and "S". 

EXAMPLES 

II vn^nHM 

Materials and methods 

Bacterial cultures, plasmids and culture media 

[0442] E. CO// was cultured on Luria-Bertani (LB) solid or liquid medium. 
M. smegmatis was cultured on Middlebrook 7H9 liquid medium (Difco) 
supplemented with albumin-dextrose (ADC), 0.2% glycerol and 0.05% Tween, or 
on solid L medium. If necessary, the antibiotic kanamycin was added at a 
concentration of 20 [ig/ml. The bacterial clones having a PhoA activity were 
detected on LB agar containing 5-bromo-4-chloro-3-indolyl phosphate (X-P, at 
40 |ig/ml). 

Manipulation of DNA and sequencing 

[0443] The manipulations of DNA and the Southem-blot analyses were 
carried out using the standard techniques (Sambrook et al., 1989), The double- 
stranded DNA sequences were determined with a Taq Dye Deoxy Temninator 
Cycle sequencing kit (Applied Biosystems), in a System 9600 GeneAmp PCR 
(Perkin-Elmer), and after migration on a model 373 DNA analyzing system 
(Applied Biosystems). 

Constructions of the plasmids 

[0444] The plasmid pJVEDa was constructed from pLA71 , a transfer 
plasmid comprising the phoA gene which is truncated and placed in phase with 
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BlaF. pLA71 was cleaved with the restriction enzymes Kpn\ and A/ofl, thus 

removing phoA without affecting the promoter of BlaF. The luc gene encoding 

the firefly luciferase was amplified from pGEM-/ucand a ribosome-binding site 

was added. phoA was amplified from pJEM1 1 . The amplified fragments were 

cleaved with Ps/I and ligated together. The oligodeoxynucleotides used are the 

following: 

[0445] pPV.Iuc.Fw: 5'GACTGCTGCAGAAGGAGAAGATCCAAATGG3' 
SEQIDN0:911) 

[0446] luc.Bw: 5'GACTAGCGGCCGCGAATTCGTCGACCTCCGAGG3' 
(SEQ ID NO: 912) 

[0447] pJEM.phoA.Fw: 5'CCGCGGATCCGGATACGTAC3' 
(SEQ ID NO: 913) 

[0448] phoA.Bw: 5'GACTGCTGCAGTTTATTTCAGCCCCAGAGCG3' 
(SEQ ID NO: 914). 

[0449] The fragment thus obtained was reamplified using the 
oligonucleotides complementary to its ends, cleaved with Kpn\ and /Vofl, and 
integrated into pLA71 cleaved with the same enzymes. The resulting construct 
was electroporated into £ coli DH5a and M. smegmatis mc^ 155. An 
M. smegmatis clone emitting light and having a phoA activity was selected and 
called pJVED/d/aF. The insert was removed using SamHI and the construct 
closed again on Itself, thus reconstructing pJVEDg. To obtain pJVEDb.c, the 
multiple cloning site was cleaved with Seal and Kpn\ and closed again, removing 
one (pJVEDb) or two (pJVEDc) nucleotides from the SnaBI site. After fusion, it 
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was thus possible to obtain six reading frames. The insert of pJVED//7sp)8 was 

obtained by polymerase chain amplification (PGR) of pPM1745 (Servant et a!., 

1995) using oligonucleotides having the sequence: 

[0450] 1 8. Fw: 5'GTACCAGTACTG ATCACCCGTCTCCCGCAC3' 
(SEQIDNO;915) 

[0451] 18.Back: AGTCAGGTACCTCGCGGAAGGGGTCAGTGCG3' 
(SEQIDNO:916) 

[0452] The product was cleaved with Kpn\ and Seal, and ligated to 
pJVEDa, cleaved with the same enzymes, thus leaving pJVED//7sp)8. 

[0453] pJVED/ PWkDa and pJVED/erp were constructed by cleaving with 
BamHl the Insert of pExp410 and pExp53, respectively, and inserting them into 
the BamHl site of the multiple cloning site of pJVEDa. 

Measurement of the alkaline phosphatase activity 

[0454] The presence of activity is detected by the blue color of the 
colonies growing on a culture medium containing the substrate 5-bromo-4-chloro- 
3-indolyl phosphate (XP), and then the activity can be quantitatively measured 
more precisely in the following manner: 

[0455] M. smegmatis was cultured in an LB medium supplemented with 
0.05% Tween 80 (Aldrich) and kanamycin (20 |ig/ml) at 37°C for 24 hours. The 
alkaline phosphatase activity was measured by the Brockman and Heppel 
method (Brockman et al., 1968) in a sonicated extract, with p-nitrophenyl 
phosphate as reaction substrate. The quantity of proteins was measured by the 
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Bio-Rad assay. The alkaline phosphatase activity is expressed as arbitrary units 

(optic density at 420 nm x \xg of protein"^ x minutes'''). 

Measurement of the luciferase activity 

[0456] M smegmatis was cultured in an LB medium supplemented with 
0.05% Tween 80 (Aldrich) and kanamycin (20 ng/ml) at 37°C for 24 hours and 
used in full exponential growth (OD at 600 nm between 0.3 and 0.8). The 
aliquots of bacterial suspensions were briefly sonicated and the cell extract was 
used to measure the luciferase activity. 25 |il of the sonicated extract were 
mixed with 100 ^1 of substrate (Promega luciferase assay system) automatically 
in a luminometer and the emitted light expressed in RLU (Relative Light Units). 
The bacteria were counted by serial dilutions of the origin suspension on LB- 
kanamycin agar medium and the luciferase activity expressed in RLU/^g of 
bacterial protein or in RLU/1 0^ bacteria. 

Construction of M tuberculosis and M, bovis-BCG genomic libraries 
[0457] The libraries were obtained essentially using pJVEDa.b.c, which are 
described above. 

[0458] Preparation of macrophages derived from bone marrow and 
infection with recombinant M. smegmatis. 

[0459] The macrophages derived from bone marrow were prepared as 
described by Lang et al., 1 991 . In summary, the bone marrow cells were 
removed from the femur of 6- to 12-week old C57BL/6 mice (Iffa-Credo, France). 
The cells in suspensions were washed and resuspended in DMEM enriched with 
10% fetal calf serum, 10% of conditioned L-cell medium and 2 mM glutamine, 
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without antibiotics. 10® cells were inoculated on flat-bottomed 24-well Costar 

plates in 1 ml. After four days at 37°C in a humid atmosphere containing a CO2 

content of 10%, the macrophages were rinsed and reincubated for an additional 

two to four days. The cells of a control well were lysed with triton x 100 at 0.1% 

in water and the nuclei enumerated. About 5x10^ adherent cells were counted. 

For the infection, M. smegmatis carrying the different plasmids was cultured in 

full exponential phase (ODeoonm between 0.4 and 0.8) and diluted to an OD of 0.1 

and then 10-fold in a medium for macrophage. 1 ml was added to each well and 

the plates were centrifuged ahd incubated for four hours at 37°C. After three 

washes, the cells were incubated in a medium containing amikacin for two hours. 

After three new washes, the adherent infected cells were incubated in a 

macrophage medium ovemight. The cells were then lysed in 0.5 ml of lysis 

buffer (Promega). 100 \x\ were sonicated and the light emitted was measured on 

25 |im. Simultaneously, the bacteria were enumerated by spreading on L-agar- 

kanamycin (20 |ig/ml). The light emitted is expressed in RLU/10^ bacteria. 

Analyses of the databanks 

[0460] The nucleotide sequences were compared with EMBL and 
GenBank using the FASTA algorithm and the protein sequences were analyzed 
by similitude by means of the PIR and Swiss Prot databanks using the BLAST 
algorithm. 

[0461 ] Example 1 : The pJVED vectors 

[0462] The pJVED vectors (Figure 51) are plasmids carrying an E. coli 
truncated phoA gene without initiation codon, signal sequence and regulatory 
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sequence. The multiple cloning site (MCS) allows the insertion of fragments of 

the genes encoding potential exported proteins as well as their regulatory 

sequences. Consequently, the fusion protein may be produced and may exhibit 

an alkaline phosphatase activity if it is exported. Only the fusions in phase may 

be produced. Thus, the MCS was modified so that the fusions may be obtained 

in six reading frames. The firefly luciferase luc gene was inserted downstream of 

phoA, The complete gene with the initiation codon, but without any promoter 

having been used, thus ought to be expressed with phoA as in a synthetic 

operon. A new ribosome-binding site was inserted eight nucleotides upstream of 

the luc initiation codon. Two transcriptional terminators are present in the pJVED 

vectors, one upstream of the MCS and a second downstream of lua These 

vectors are E. co/Amycobacterium transfer plasmids with a kanamycin-resistance 

gene as selectable marker. 

[0463] phoA and luc function as in an operon, but export is necessary for 

the phoA activity. 

[0464] Four plasmids were constructed by insertion into the MCS of DNA 
fragments of diverse origin: 

[0465] In the first construct called pJVED/fo/aF, the 1 .4 kb fragment is 
derived from the plasmid already described pLA71 (Lim et a!., 1995). This 
fragment, derived from the p-lactamase gene {blaF) of /W. fortuitum D216 
(Timm et al., 1994), includes the hyperactive mutated promoter, the segment 
encoding 32 amino acids of the signal sequence and the first 5 amino acids of 
the mature protein. Thus, this construct includes the strongest promoter known 
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in mycobacterium and the elements necessary for the export of the phoA fusion 

protein. Consequently, a strong light emission and a good phoA activity can be 

expected from this construct (cf. Figures 53 and 54), 

[0466] Into a second construct called pJVED//7sp)8, a 1 .5 kb fragment 
was cloned from the plasmid already described pPM1745 (Servant et a!., 1995). 
This fragment includes the nucleotides encoding the first ten amino acids of the 
18 kb heat shock protein derived from Streptomyces albus (heat shock protein 
18. HSR 18), the ribosome-binding site, the promoter and, upstream, regulatory 
sites controlling its expression. This protein belongs to the alpha-crystalline 
family of low-molecular weight HSR (Verbon et a!., 1992). Its homolog, derived 
from M. leprae, the 18 kDa antigen, is already known to be induced during 
phagocytosis by a murine macrophage of the J-774 cell line (Dellagostinet et al., 
1995). Under standard culture conditions, pJVED//?sp)5 shows a weak luc 
activity and no phoA activity (cf . Figures 53 and 54). 

[0467] In a third construct, called pJVED/P75/cDa, the insert derived from 
pExp410 (Lim et al,, 1995) was cleaved and cloned into the MCS of pJVEDa. 
This fragment includes the nucleotides encoding the first 134 amino acids of the 
M. tuberculosis 19 kDa known protein and of its regulatory sequences. As has 
been demonstrated, this protein is a glycosylated lipoprotein (Garbe et al., 1993; 
Herrmann et al., 1996). In Figures 53 and 54, a good /uc activity corresponding 
to a strong promoter is observed for this construct, but the phoA activity is the 
strongest of the four constructs. The high phoA activity of this fusion protein with 
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a lipoprotein is explained by the fact that it remains attached to the cell wall by its 

N-terminal end, 

[0468] In the fourth and last construct, called pJVED/e/p, the insert is 
derived from pExp53 (Lim et a!., 1995) and was cloned into the MCS of pJVEDa. 
pExp53 is the initial plasmid selected for its phoA activity and containing a portion 
of the M. tuberculosis erp gene which encodes a 28-kDa antigen. The latter 
includes the signal sequence, a portion of the mature protein and, upstream of 
the initiation codon, the ribosome-binding site. The promoter was mapped. A 
putative iron box of the fur type is present in this region and flanks the -35 region 
of the promoter (Berthet et al., 1 995). As expected (Figures 53 and 54) this 
construct exhibits a good light emission and a good phoA activity. The fact that 
this fusion protein, unlike the fusion with the lipoprotein of 19 kDa, does not 
appear to be attached to the cell wall does not exclude that the native protein is 
combined with it. Furthermore, the C-terminal end of erp is absent from the 
fusion protein. 
Example 2: 

[0469] Construction of an M. tuberculosis genomic DNA library in the 
pJVEDs vectors and identification of one of the members of these libraries, 
(DP428), induced during phagocytosis by murine macrophages derived from 
bone marrow. 

[0470] The various constructs were tested for their capacity to evaluate 
the intracellular expression of the genes identified by the expression of phoA, 
For this purpose, the luc activity is expressed in RLU for 10^ bacteria in axenic 
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culture and/or under intracellular conditions. The induction or the repression 

following phagocytosis by the bone marrow-derived murine macrophages can be 

suitably evaluated by the measurement of specific activities. The results of two 

separate experiments are presented in Table 2. 

[0471] The plasmid pJVED//7sp/8 was used as positive control for the 
induction during the intracellular growth phase. Although the induction of the 
promoter by heating the bacterium at 42''C was not conclusive, the phagocytosis 
of the bacterium clearly leads to an increase in the activity of the promoter. In all 
the experiments, the intracellular /uc activity was strongly induced, increasing by 
20 to 100-fold the initially weak basal activity (Servant, 1995). 

[0472] The plasmid pJVED/fc/aFwas used as a control for nonspecific 
modulation during the phagocytosis. It was possible to detect weak variations 
which were probably due to changes in culture conditions. Whatever the case, 
these weak variations are not comparable to the induction observed with the 
plasmid pJVED//7sp78, 

[0473] All the members of the DNA library were tested by measuring the 
activity of the promoter during the intracellular growth. Among these, DP428 is 
strongly induced during phagocytosis (Tables 1 and 2), 
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TABLE 1 



Construct 


% Recovery 


RLU/10^ 

extracellular 

bacteria 


RLU/10^ 

intracellular 

bacteria 


Induction 


pJVED/WaP 


0.5 


1460 


1727 


1.2 


pJVED//7sp78 


0.6 


8 


57 


7.1 


pJVED/DP428 


0.7 


0.06 


18 


300 












our loliULrl 


/o ncLrUVcry 
0576176 Balb/C 


PI 1 1/1 
bacteria 


PI 1 1/1 

intr^r^plli il^r 

II 1 LI ClOdiUICll 

bacteria 

C57BL/6 Balb/C 


II lUUl/UUI 1 

C57BLy6 

Balb/C 


pJVED/fc/aP 


7 1.1 


662 


250 911 


0.4 1.4 


pJVED/A7sp78 


6.7 1.7 


164 


261 325 


1.6 2 


PJVED/DP428 


1.6 2.1 


0.08 


1.25 3.3 


15.6 41 
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TABLE 2 



Construct 


% Recovery 


RLU/10^ 
extracellular 
bacteria | 


RLU/10^ 

intracellular 

bacteria 


Induction 


pJVED/fc/aP 


22 


1477 1 


367 


0.25 


pJVED//isp78 


7 


0.26 ' 


6.8 


26 


PJNEDIDP428 


21 


0.14 


4 


28 



[0474] The nucleotide fragment encoding the N-temiinal region of the 
polypeptide DP428 having the sequence SEQ ID NO: 543 is contained in the 
plasmid deposited at the CNCM under the No. 1-1 818. 

[0475] The entire sequence encoding the polypeptide DP428 was 
obtained as detailed below. 

[0476] A probe was obtained by PGR with the aid of oligonucleotides 
having the sequence SEQ ID NO: 528 and SEQ ID NO: 529. This probe was 
labeled by random extension in the presence of [^^P]dCTP. Hybridization of the 
genomic DNA of M. tuberculosis strain Mt103 previously digested with the 
endonuclease Seal was carried out with the aid of said probe. The results of the 
hybridization revealed that a DNA fragment of about 1,7 kb was labeled. 
Because an Seal site exists, extending from the nucleotide nt 984 to the 
nucleotide nt 989 of the sequence SEQ ID NO: 1 , that is to say on the 5' side of 
the sequence used as probe, the end of the coding sequence is necessarily 
present in the fragment detected by hybridization. 
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[0477] The genomic DNA of the M, tuberculosis Mt 103 strain, after 
digestion with Seal, was subjected to migration on agarose gel. The fragments 
of between 1 .6 and 1 .8 kb in size were cloned into the vector pSL1180 
(Pharmacia) previously cleaved with Seal and dephosphorylated. After 
transformation of £ coli with the resulting recombinant vectors, the colonies 
obtained were screened with the aid of the probe. The screening made it 
possible to isolate six colonies hybridizing with this probe. 

[0478] The inserts contained in the plasmids of the previously selected 
recombinant clones were sequenced and then the sequences aligned so as to 
determine the entire sequence encoding DP428, more specifically SEQ ID NO: 
35. 

[0479] A pair of primers were synthesized in order to amplify, starting 
with the genomic DNA of M. tuberculosis, strain Mt 103, the entire sequence 
encoding the polypeptide DP428. The amplicon obtained was cloned into an 
expression vector. 

[0480] Pairs of primers appropriate for the amplification and the cloning 
of the sequence encoding the polypeptide DP428 can be easily produced by 
persons skilled in the art, on the basis of the nucleotide sequences SEQ ID NO: 
1 and SEQ ID NO: 35. 

[0481] A specific pair of primers according to the invention is the 
following pair of primers, which is capable of amplifying the DNA encoding the 
polypeptide DP428 lacking its signal sequence: 
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[0482] - forward primer (SEQ ID NO: 917), comprising the sequence 

going from the nucleotide at position nt 531 to the nucleotide nt 554 of the 

sequence SEQ ID NO: 35: 

[0483] 5' -AGTGCAT GCTGCTGGCCGAACCATCAGCGAC - 3' 

[0484] - backward primer (SEQ ID NO: 918), comprising the sequence 

complementary to the forward sequence of the nucleotide at position nt 855 to 

the nucleotide at position nt 835 of the sequence SEQ ID NO: 35: 

[0485] 5' -CAGCCAGATC TGCGGGCGCCCTGCACCGCCTG - 3', 

[0486] in which the portion underlined represents the sequences 

hybridizing specifically with the sequence SEQ ID NO: 35 and the 5' ends 

correspond to restriction sites for the cloning of the resulting amplicon into a 

cloning and/or expression vector. 

[0487] A specific vector used for the expression of the polypeptide 

DP428 is the vector pQE70 marketed by the company Qiagen. 

Example 3: 

[0488] The complete sequence of the DP428 gene and its flanking 
regions. 

[0489] A probe of the coding region of DP428 was obtained by PGR and 
used to hybridize the genomic DNA of various mycobacterial species. According 
to the results of Figure 3, the gene is present only in mycobacteria of the 
M. tuberculosis complex. 
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[0490] Analysis of the sequence suggests that DP428 could be part of an 
operon. The coding sequence and the flanking regions exhibit no homology with 
known sequences deposited in databanks. 

[0491] Based on the coding sequence, the gene encodes a 10 kDa 
protein with a signal peptide, a hydrophobic C-terminal end which ends with two 
arginines and is preceded by an LPISG (SEQ ID NO: 934) motif similar to the 
known LPXTG (SEQ ID NO: 935) motif. These two arginines could correspond 
to a retention signal and the protein DP428 could be attached via this motif to 
peptidoglycans as has already been described in other Gram"*" bacteria 
(Navarre et a!., 1994 and 1996). 

[0492] The mechanism for survival and intracellular growth of 
mycobacteria is complex and the intimate relationships between the bacteria and 
the host cell remain unexplained. Whatever the mechanism, the growth and the 
intracellular survival of mycobacteria depend on factors produced by the bacteria 
produced by the bacterium and capable of modulating the response of the host. 
These factors may be molecules which are exposed at the cell surface, such as 
LAM or cell surface-associated proteins, or actively secreted molecules. 

[0493] On the other hand, intracellularly, the bacteria themselves have to 
confront a hostile environment. They appear to respond to this by means similar 
to those used under stress conditions, by inducing heat shock proteins 
(Dellagostin et al., 1995), but also by the induction or the repression of various 
proteins (Lee et al., 1995). Using a methodology derived from PGR, Plum and 
Clark-curtiss (Plum et al., 1994) have shown that an M. avium gene included in a 
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3 kb DNA fragment is induced after phagocytosis by human macrophages. This 

gene encodes an exported protein comprising a leader sequence but exhibiting 

no significant homology with the sequences proposed by databanks. The 

induction, during the intracellular growth phase, of a low-molecular-weight heat 

shock protein derived from M. leprae has also been demonstrated (Dellagostin et 

al., 1995). In another study, the bacterial proteins from M. tuberculosis were 

metabolically labeled during the intracellular growth phase or under stress 

conditions and separated by two-dimensional gel electrophoresis: 16 M. 

tuberculosis proteins were induced and 28 were repressed. The same proteins 

are involved during stress caused by a low pH, a heat shock, H2O2, or during 

phagocytosis by human monocytes of the THP1 line. Whatever the case, the 

behavior of the induced and repressed proteins was unique under each condition 

(Lee et al., 1995). Taken together, these results indicate that a subtle molecular 

dialogue is installed between the bacteria and their host cells. This dialogue 

probably depends on the fate of the intracellular organism. 

[0494] In this context, the induction of the expression of DP428 could be 
of major importance, indicating an important role for this protein in intracellular 
survival and growth. 

[0495] The method used in these experiments to evaluate the 
intracellular expression of the genes (cf. Jacobs et al., 1993, for the method for 
detemnining the expression of firefly luciferase, and Lim et al., 1995, for the 
method for determining the expression of the PhoA gene) has the advantage of 
being simple compared with the other techniques such as the technique 
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described by Mahan et al. (Mahan et al., 1993) adapted to mycobacteria and 

proposed by Bange et al,, (Bange et al., 1996) or the subtractive method based 

on PGR described by Plum and Clark-Curtiss (Plum et al., 1994). Variability 

undoubtedly exists as shown by comparing the various experiments. Although 

causing the induction or the repression is sufficient, it is now possible to evaluate 

it, thus providing an additional tool for the physiological studies of the exported 

proteins identified by fusion with phoA. 

Example 4: 

[0496] Search for modulation of the activity of the promoters during the 
intramacrophage phases. 

[0497] Mouse bone marrow macrophages are prepared as described by 
Lang and Antoine (Lang et al., 1991). Recombinant M, smegmatis bacteria, 
whose luciferase activity per 10^ bacteria has been detemiined as above, are 
incubated at 37*'C under a humid atmosphere enriched with 5% CO2, for 4 hours 
in the presence of these macrophages such that they are phagocytosed. After 
rinsing in order to remove the remaining extracellular bacteria, amikacin 
(100 |ig/ml) is added to the culture medium for two hours. After another rinsing, 
the medium is replaced with an antibiotic-free culture medium (DMEM enriched 
with 10% calf serum and 2 mM glutamine). After overnight incubation as above, 
the macrophages are lysed at low temperature (4°C) with the aid of a lysis buffer 
(cee lysis buffer, Promega), and the luciferase activity per 10^ bacteria is 
determined. The ratio of the activities at placing in culture and after one night 
gives the coefficient of induction. 
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Example 5: 

[0498] Isolation of a series of sequences by sequencing directly using 
colonies. 

[0499] A series of sequences allowing the expression and export of phoA 
were isolated from the DNA of M. tuberculosis or of M. bovis BCG. Among this 
group of sequences, two of them were further studied, the entire genes 
corresponding to the inserts were cloned, sequenced and antibodies against the 
product of these genes sen/ed to show by electron microscopy that these genes 
encoded antigens found at the surface of the tubercle bacilli. One of these 
genes, erp, encoding a consensus export signal sequence, the other, des, 
possessed no characteristic of a gene encoding an exported protein, based on 
the sequence. Another gene, DP428, was sequenced before the sequence of 
the M. tuberculosis genome became available. It contains a sequence 
resembling the consensus sequence for attachment to peptidoglycan, which 
suggests that it is also an antigen which is probably found at the surface of the 
tubercle bacilli. The study of the three genes, erp, des, and that encoding 
DP428, shows that the phoA system which we have developed in mycobacteria 
makes it possible to pick out genes encoding exported proteins with no 
determinant which can be picked out by studies in silico. This is particularly true 
for the polypeptides which do not possess a consensus signal sequence (des) or 
no similarity with proteins having a known function {erp and DP428), 

[0500] A number of inserts were identified and sequenced before 
knowing the genome of M. tuberculosis or of others below. These sequences 
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may be considered as primers which make it possible to search for genes 

encoding exported proteins. To date, a series of primers have been sequenced 

and the entire corresponding genes have been either sequenced or identified 

based on the published sequence of the genome. To take into account 

sequencing errors which are always possible, the regions upstream or 

downstream of some primers were considered as being capable of forming part 

of sequences encoding exported proteins. In some cases, similarities with genes 

encoding exported proteins or sequences characteristic of export signals or 

topological characteristics of membrane proteins were detected. 

[0501 ] Primer sequences are found to correspond to genes belonging to 
families of genes possessing more than 50% similarity. It is thus possible to 
indicate that the other genes detected by similarity with a primer encode exported 
proteins. This is the case for the sequence SEQ ID NO: 154 and SEQ ID NO: 
156 which possess more than 77% similarity with SEQ ID NOS: 137& 143. 

[0502] The sequences which may encode exported proteins are the 
following: SEQ ID NOS: 1, 8, 14, 25. 31, 33, 137, 139, 141, 143, 145, 148, 150, 
152. 154, 156, 158. 160, 162, 225, 228, 238, 246. 250. 255, 258, 260. 41. 46, 52. 
165, 169, 177, 407, 410. 412, 419, 421, 426. 429, 431, 433, 437, 441, 447. 452. 
456. 459. 461, 110, 113, 119, 353, 357, 359, 489. 495, 497, 501. 505, 510. 516, 
519. 522. 651. 653. 657, 660, 662, 759, 761, 764, 767, 769, 811, 813, 817. 821. 
823. 887, 895. 901 , 907. and 909. 

[0503] Genes identified based on the primers from the sequence of the 
genome have no characteristic (based on the sequence) of the exported 
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proteins. They are the following sequences: SEQ ID NOS: 57-61, 63, 65-66, 68, 
70-71, 73, 75, 77, 79-80, 82-83, 85, 87, 531-533, 535-536, 538-542, 185-188, 
190-194, 196-199, 201, 203-205, 207-208, 210, 212, 214-216. 218-219, 221-224. 
263-267, 269-271 , 275-277, 279, 281, 283, 285, 287. 289. 291-296, 298-309, 
311-316. 123-127, 129-132, 134-136. 318-3^0, 322, 324, 326, 328-330. 332, 
334. 336. 338, 340-345, 348-352, 362-363, 365-367. 369, 370. 372-373, 375- 
379, 381-382. 384. 386, 388, 390-392, 394, 396. 398. 400-402, 404, 406, 464- 
468, 470-471, 473, 475, 477-481, 483-484, 486, 488, 547-549, 551, 553, 555, 
557, 559-563, 565-568, 570, 572, 574-575. 577-579, 581-583, 585, 587, 589, 
591-593, 595, 597, 599, 601-603, 605-607, 609, 611. 613. 615, 617, 619, 621, 
623, 625, 627-628, 630, 632, 634, 636-639, 641-646, 648, 650, 665, 931-933, 
667-668, 670-673, 675, 677, 679-682, 684-685, 687-690, 692, 694, 696, 698- 
701, 703-716, 718-727. 729-732. 734-735, 737, 738, 740, 742, 744-745, 747- 
751, 753-754, 756, 758, 772-780, 781-783, 785-793, 795-804, 806, 808, 810, 
826, 828-830, 832, 834, 836, 838, 840-841, 843, 845, 847, 849-863, 865-877, 
879-882, 884, and 886. 

[0504] Based on the sequence of other organisms such as E.coli, It is 
possible to search in the sequence of the M. tuberculosis genome for genes 
possessing similarities with proteins known to be exported in other organisms 
although not possessing an export signal sequence. In this case, fusion with 
phoA is an advantageous protocol for determining if these M. tuberculosis 
sequences encode exported proteins although possessing no consensus signal 
sequence. It has indeed been possible to clone SEQ ID NOS: 848, 864, 878, 
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883, and 885, sequences similar to an E.co// gene of the htrA family. A fusion of 

SEQ ID NOS: 848, 864, 878, 883, and 885 with phoA leads to the expression 

and the export of phoA. M, smegmatis colonies harboring SEQ ID NOS: 848, 

864, 878, 883, and 885 phoA fusion on a plasmid pJVED are blue. 

[0505] SEQ ID NOS: 849-863, 865-877, 879, 880-882, 884, and 886 are 
therefore considered exported proteins. 

[0506] The phoA method is therefore useful for detecting, based on the 
M. tuberculosis sequence, genes encoding exported proteins without them 
encoding sequences which are characteristic of the exported proteins. 

[0507] Even if a sequence possesses detemiinants of exported proteins, 
this does not demonstrate a functional export. The phoA system makes it 
possible to show that the gene suspected really encodes an exported protein. 
Thus, it was checked that the sequences SEQ ID NOS: 887, 895, 901 , 907, and 
909 indeed possessed export signals. 
TABLE 3 



SEQ ID No. 


Reference of the 
corresponding sequence 
predicted by Cole et al. 




Annotation 


SEQ ID NOS: 2-7, 
9-13. 15-24,26-30, 
32,34 


Rv 0203 


* 


Sequence hydrophobic at 
the N-terminus 


SEQ ID NOS: 57- 


Rv 2050 




No prediction 
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61,63, 65-66,68, 








70-71,73, 75,77, 








79-80, 82-83, 85, 








87, 531-533, 535- 








536, 538-542 








SEQ ID NOS: 


Rv 2563 


* 


Membrane protein 


138, 272-273, 








140, 142, 144, 








146-147, 149, 








151, 153, 155, 








157, 159, 161, 








163-164 








ocU lU NUb. 


riV Kju/d. 


* 


r oooioie irarioiiiornijicinc 


155, 157 






trfln«;nr»rt nrotein of the 








ABC type 


SEQ ID NOS: 


Rv 0546c 


ML 


Protein S-D LactoyI 


185-188, 190- 






Glutathione-methyl 


194, 196-199, 






glyoxal lyase 


201,203-205, 








207-208,210, 








212 
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SEQ ID NOS: 


no prediction 




not found in 


214-216,218, 






M.tuberculosis H37rv 


219, 221-224 








SEQ ID NOS: 


Rv 1984c 


* 


probable precursor 


226-227, 923- 






cutinase with an N- 


925, 229-237, 






terminal signal sequence 


239-245, 247- 








249, 251-254, 








256-257, 259, 








261,42-45, 47- 








51,53-55, 166- 








168, 170-176, 








178-183 








SEQ ID NOS: 


no prediction 




no prediction 


263-267, 269- 








271,275-277, 








279, 281,283, 








285, 287, 289, 








291-296, 298- 








309,311-316, 








123-127, 129- 








132, 134-136 
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SEQ ID NOS: 


with reading frame shift, 




no prediction 


318-320, 322, 


could be in phase with 






324, 326, 328- 


Rv 2530c 






330, 332, 334, 








336, 338, 340- 








345, 348-352 


• 






SEQ ID NOS: 


Rv1303 


ML 


no prediction 


362-363, 365- 








367, 369-370, 








372-373, 375- 








379, 381-382, 








384, 386 








SEQ ID NOS: 


Rv0199 


ML 


no prediction 


388, 390-392, 








394, 396, 398, 








400-402, 404, 








406 








SEQ ID NOS: 


Rv0418 


* 


site for attachment of 


408-409, 411, 






prokaryotic membrane 


413-418, 420, 






lipoprotein, similarity with 


422-425, 427- 
428, 430, 432 






N-acetyl puromycin acetyl 
hydrolase 
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SEQ ID NOS: 


Rv 3576 


* 


contains a site for 


434-436, 438- 






attachment of prokaryotic 


440, 442-446, 
448-451,453- 
455, 457-458, 
460, 462, 111- 






nnembrane lipoprotein, 
similarity with a 
serine/threonine protein 
kinase 


112, 114-118, 








120-121 








SEQ ID NOS: 


Rv 3365c 


ML 


similarity with a zinc 


464-468, 470- 






metallopeptidase 


471,473, 475, 








477-481,483- 








484, 486, 488 








SEQ ID NOS: 


not predicted 




no prediction 


547-549, 551, 








553, 555 








SEQ ID NOS: 


Rv 0822c 


ML 


Existence of a consensus 


557, 559-563, 






region with the drac family 


565-568, 570, 








572 








SEQ ID NOS: 


Rv 1044 




no prediction 


574-575, 577- 
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579, 581-583, 
585, 587 








SEQ ID NOS: 
589,591-593, 
595, 597 


not predicted 




no prediction 


SEQ ID NOS: 
599,601-603, 
605-607, 609, 
611 


Rv2169c 




no prediction 


SEQ ID NOS: 
613, 615, 617, 
619,621 


Rv 3909 


ML 


no prediction 


SEQ ID NOS: 
623, 625, 627- 
628, 630, 632 


Rv 2753c 




similarity with 

dihydropricolinate 

synthases 


SEQ ID NOS: 
634, 636-639, 
641-646,648, 
650, 


Rv 0175 




no prediction 


SEQ ID NOS: 
652, 654-656, 
658-659, 661, 


Rv 3006 


* 

ML 


prediction of lipoprotein 
signal sequence 
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663 








SEQ ID NOS: 


Rv 0549c 




no prediction 


665, 931-933, 








667-668, 670- 








673, 675, 677 








SEQ ID NOS: 
679-682, 684- 
685, 687-690, 


Rv 2975c being capable of 
being in phase with 
Rv 2974c 




similarity with substilis 
protein 


692, 694, 696, 








698-701,703- 








716, 718-727 








SEQ ID NOS: 
729-732, 734- 


Rv 2622 




similarity with a methyl 
transferase 


735, 737-738, 








740, 742 








SEQ ID NOS: 


Rv 3278c 


ML 


no prediction 


744-745, 747- 








751,753-754, 








756, 758 








SEQ ID NOS: 


Rv 0309 


* 


no prediction 


760, 762-763, 








765-766, 768, 
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770 








SEQ ID NOS: 


Rv 2169c 


ML 


no prediction 


772-783, 785- 








793, 795-804, 




■ 




806, 808, 810 








SEQ ID NOS: 
812, 814-816, 
818-820, 822, 


Rv 1411c 


* 


probable lipoprotein with 
an N-terminal signal 
sequence 


824 








SEQ ID NOS: 
826, 828-830, 


Rv1714 




similarity with a gluconate 
3-dehydrogenase 


832, 834, 836 








SEQ ID NOS: 
838, 840-841, 

QAO QAC QA7 

o4o, o4o, o4/, 


Rv 0331 




similarity with a sulfide 
dehydrogenase and a 
sulfide quinone reductase 


SEQ ID NOS: 
849-863, 865- 


Rv 0983 


ML 


similarity with a serine 
protease HtrA 


877, 879-882. 








884, 886 








SEQ ID NOS: 89, 








91,93-95, 97, 99, 
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101-103, 105, 








107, 109 








SEQ ID NOS: 


Rv3810 


* 


Surface protein (Berthelet 


354-356, 358, 




ML 


etal.. 1995) 


360, 926-930 








SEQ ID NOS: 


Rv 3763 


* 


Contains a site for 


490-494, 496, 






attachment of eukaryotic 


498-500, 502- 






membrane lipoprotein 


504, 506-509, 








511-515,517- 








518, 520-521, 








523-527 








SEQ ID NOS: 


Rv0125 


* 


Active site of serine 


888-894, 896- 






proteases 


900, 902-906, 






Possible N-terminal signal 


908, 910 






sequence 



Legend to Table 3: 

[0508] Correspondence between the sequences according to the 
invention and the sequences predicted by Cole et al., 1998, Nature, 393, 537- 
544. 

[0509] *: Prediction that the protein encoded by the sequence 
is exported. 
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[0510] ML: Prediction of similarity with M. leprae. 

Example 6: 

[051 1 ] Characteristics and production of the protein M1 C25. 

[0512] The N-terminal end of the protein M1C25 was detected by 
the PhoA system as allowing the export of the fusion protein, necessary for the 
production of its phosphatase activity. 

[0513] The DNA sequence encoding the N-terminal end of the 
protein M1C25 is contained in the sequences SEQ ID NOS: 433, 437, 441, 447, 
452, 456, 459, 461 of the present patent application. 

[0514] From this primer sequence, the complete gene encoding the 
protein M1C25 was sought in the M. tuberculosis genome (Wellcome Trust 
Foundation, Sanger site). 

[0515] The Sanger center attributed to M1C25 the names: 

[0516] Rv3576, 

[0517] MTCY06G 11.23, 

[0518] pknM 

Sequence SEQ ID NO: 544 of the complete M1C25 gene (714 bases): cf. 
Figure 29 

[0519] This gene encodes a protein of 237 AA, having a molecular 
mass of 25 kDa. This protein is listed in the libraries under the names: 
[0520] PID:e306716, 
[0521] SPTREMBL:P96858 
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Sequ nc SEQ ID NO: 545 of the protein M1C25 (235 amino acids): cf. 

Figure 30 

[0522] Ml C25 contains a site for attachment to the lipid portion of 
the prokaryotic membrane lipoproteins (PS00013 Prokaryotic membrane 
lipoprotein lipid attachment site: 

[0523] CTGGTCGGTG CGTGCATGCT CGCAGCCGGA TGC)(SEQ ID 
NO: 919). 

[0524] The function of M1C25 is not clear but it most probably possesses 
a "serine/threonine protein kinase" activity. Similarities should be noted with the 
C-terminal moiety of K08G_MYCTU 01 1053 Rv1266c (MTCY50.16). Similarities 
are also found with KY28_MYCTU. 

[0525] A gene potentially encoding a regulatory protein (PID:e306715, 
SPTREMBL:P96857, Rv3575c, (MTCY06G11.22c)) is found in 5' of the gene 
encoding M1C25. 

[0526] The hydrophobicity profile (Kyte and Doolitle) of Ml 025 is 
represented in Figure 56. 

[0527] A site of cleavage of the signal sequence is predicted (SignalP 
V1 .1 ; World Wide Web Prediction Server, Center for Biological Sequence 
Analysis) between amino acids 31 and 32: AVA-AD. This cleavage site is behind 
a conventional "AXA" motif. This prediction is compatible with the hydrophobicity 
profile. In this potential signal sequence, it is observed that the sequence of the 
three amino acids LAA is repeated three times. 
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[0528] Cloning of the Ml C25 gene for the production of the protein which 

it encodes: 

[0529] A pair of primers were synthesized in order to amplify, using the 
genomic DNA of M. tuberculosis, strain H37RV, the entire sequence encoding the 
polypeptide M1C25. The amplicon obtained was cloned into an expression 
vector. 

[0530] Pairs of primers appropriate for the amplification and the cloning 

of the sequence encoding Ml C25 were synthesized: 
[0531] - fonward primer:(SEQ ID NO: 920) 
[0532] 5'-ATAATACCA TGGGCAAGCAGCTAGCCGCGC - 3' 
[0533] - backward primer:(SEQ ID NO: 921) 
[0534] 5'-ATTTATAGATCT CTGCTTAGCAAGCTTGGCCGCG - 3' 
[0535] The underlined portion represents the sequences specifically 

hybridizing with the M1C25 sequence and the 5' ends correspond to restriction 

sites for the cloning of the resulting amplicon into a cloning and/or expression 

vector, 

[0536] A specific vector used for the expression of the polypeptide 
Ml 025 is the vector pQESO marketed by the company Qiagen, following the 
protocol and the recommendations proposed by this brand. 

[0537] The cells used for the cloning are bacteria: E.co// XL1 -Blue 
(resistant to tetracycline). 

[0538] The cells used for the expression are bacteria: E.co// M1 5 
(resistant to kanamycin) containing the plasmid pRep4 (Ml 5 pRep4). 
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[0539] The production of the protein MYC25 is illustrated by Figures 57 A 

and B (bacterial extracts from the E.coli Ml 5 strain containing the plasmid 

pM1C25). The bacterial cultures and the extracts are prepared according to 

Sambrook et al. (1989). Analysis of the bacterial extracts is carried out according 

to the Quiagen instructions (1997). 
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